Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yujie Zhang

CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment

Jan 17, 2025

Yating Liu, Yujie Zhang, Ziyu Shan, Yiling Xu

Abstract:In recent years, No-Reference Point Cloud Quality Assessment (NR-PCQA) research has achieved significant progress. However, existing methods mostly seek a direct mapping function from visual data to the Mean Opinion Score (MOS), which is contradictory to the mechanism of practical subjective evaluation. To address this, we propose a novel language-driven PCQA method named CLIP-PCQA. Considering that human beings prefer to describe visual quality using discrete quality descriptions (e.g., "excellent" and "poor") rather than specific scores, we adopt a retrieval-based mapping strategy to simulate the process of subjective assessment. More specifically, based on the philosophy of CLIP, we calculate the cosine similarity between the visual features and multiple textual features corresponding to different quality descriptions, in which process an effective contrastive loss and learnable prompts are introduced to enhance the feature extraction. Meanwhile, given the personal limitations and bias in subjective experiments, we further covert the feature similarities into probabilities and consider the Opinion Score Distribution (OSD) rather than a single MOS as the final target. Experimental results show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches.

Via

Access Paper or Ask Questions

Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation

Dec 15, 2024

Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, Yiling Xu

Abstract:Text-to-3D generation has achieved remarkable progress in recent years, yet evaluating these methods remains challenging for two reasons: i) Existing benchmarks lack fine-grained evaluation on different prompt categories and evaluation dimensions. ii) Previous evaluation metrics only focus on a single aspect (e.g., text-3D alignment) and fail to perform multi-dimensional quality assessment. To address these problems, we first propose a comprehensive benchmark named MATE-3D. The benchmark contains eight well-designed prompt categories that cover single and multiple object generation, resulting in 1,280 generated textured meshes. We have conducted a large-scale subjective experiment from four different evaluation dimensions and collected 107,520 annotations, followed by detailed analyses of the results. Based on MATE-3D, we propose a novel quality evaluator named HyperScore. Utilizing hypernetwork to generate specified mapping functions for each evaluation dimension, our metric can effectively perform multi-dimensional quality assessment. HyperScore presents superior performance over existing metrics on MATE-3D, making it a promising metric for assessing and improving text-to-3D generation. The project is available at https://mate-3d.github.io/.

Via

Access Paper or Ask Questions

Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

Aug 11, 2024

Hannuo Zhang, Huihui Li, Jiarui Lin, Yujie Zhang, Jianghua Fan, Hang Liu

Abstract:Optical remote sensing and Synthetic Aperture Radar(SAR) remote sensing are crucial for earth observation, offering complementary capabilities. While optical sensors provide high-quality images, they are limited by weather and lighting conditions. In contrast, SAR sensors can operate effectively under adverse conditions. This letter proposes a GAN-based SAR-to-optical image translation method named Seg-CycleGAN, designed to enhance the accuracy of ship target translation by leveraging semantic information from a pre-trained semantic segmentation model. Our method utilizes the downstream task of ship target semantic segmentation to guide the training of image translation network, improving the quality of output Optical-styled images. The potential of foundation-model-annotated datasets in SAR-to-optical translation tasks is revealed. This work suggests broader research and applications for downstream-task-guided frameworks. The code will be available at https://github.com/NPULHH/

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Asynchronous Feedback Network for Perceptual Point Cloud Quality Assessment

Jul 13, 2024

Yujie Zhang, Qi Yang, Ziyu Shan, Yiling Xu

Abstract:Recent years have witnessed the success of the deep learning-based technique in research of no-reference point cloud quality assessment (NR-PCQA). For a more accurate quality prediction, many previous studies have attempted to capture global and local feature in a bottom-up manner, but ignored the interaction and promotion between them. To solve this problem, we propose a novel asynchronous feedback network (AFNet). Motivated by human visual perception mechanisms, AFNet employs a dual-branch structure to deal with global and local feature, simulating the left and right hemispheres of the human brain, and constructs a feedback module between them. Specifically, the input point clouds are first fed into a transformer-based global encoder to generate the attention maps that highlight these semantically rich regions, followed by being merged into the global feature. Then, we utilize the generated attention maps to perform dynamic convolution for different semantic regions and obtain the local feature. Finally, a coarse-to-fine strategy is adopted to merge the two features into the final quality score. We conduct comprehensive experiments on three datasets and achieve superior performance over the state-of-the-art approaches on all of these datasets. The code will be available at https://github.com/zhangyujie-1998/AFNet.

Via

Access Paper or Ask Questions

Perception-Guided Quality Metric of 3D Point Clouds Using Hybrid Strategy

Jul 04, 2024

Yujie Zhang, Qi Yang, Yiling Xu, Shan Liu

Abstract:Full-reference point cloud quality assessment (FR-PCQA) aims to infer the quality of distorted point clouds with available references. Most of the existing FR-PCQA metrics ignore the fact that the human visual system (HVS) dynamically tackles visual information according to different distortion levels (i.e., distortion detection for high-quality samples and appearance perception for low-quality samples) and measure point cloud quality using unified features. To bridge the gap, in this paper, we propose a perception-guided hybrid metric (PHM) that adaptively leverages two visual strategies with respect to distortion degree to predict point cloud quality: to measure visible difference in high-quality samples, PHM takes into account the masking effect and employs texture complexity as an effective compensatory factor for absolute difference; on the other hand, PHM leverages spectral graph theory to evaluate appearance degradation in low-quality samples. Variations in geometric signals on graphs and changes in the spectral graph wavelet coefficients are utilized to characterize geometry and texture appearance degradation, respectively. Finally, the results obtained from the two components are combined in a non-linear method to produce an overall quality score of the tested point cloud. The results of the experiment on five independent databases show that PHM achieves state-of-the-art (SOTA) performance and offers significant performance improvement in multiple distortion environments. The code is publicly available at https://github.com/zhangyujie-1998/PHM.

Via

Access Paper or Ask Questions

A Shared-Aperture Dual-Band sub-6 GHz and mmWave Reconfigurable Intelligent Surface With Independent Operation

Jun 05, 2024

Junhui Rao, Yujie Zhang, Shiwen Tang, Zan Li, Zhaoyang Ming, Jichen Zhang, Chi Yuk Chiu, Ross Murch

Abstract:A novel dual-band reconfigurable intelligent surface (DBI-RIS) design that combines the functionalities of millimeter-wave (mmWave) and sub-6 GHz bands within a single aperture is proposed. This design aims to bridge the gap between current single-band reconfigurable intelligent surfaces (RISs) and wireless systems utilizing sub-6 GHz and mmWave bands that require RIS with independently reconfigurable dual-band operation. The mmWave element is realized by a double-layer patch antenna loaded with 1-bit phase shifters, providing two reconfigurable states. An 8x8 mmWave element array is selectively interconnected using three RF switches to form a reconfigurable sub-6 GHz element at 3.5 GHz. A suspended electromagnetic band gap (EBG) structure is proposed to suppress surface waves and ensure sufficient geometric space for the phase shifter and control networks in the mmWave element. A low-cost planar spiral inductor (PSI) is carefully optimized to connect mmWave elements, enabling the sub-6 GHz function without affecting mmWave operation. Finally, prototypes of the DBI-RIS are fabricated, and experimental verification is conducted using two separate measurement testbeds. The fabricated sub-6 GHz RIS successfully achieves beam steering within the range of -35 to 35 degrees for DBI-RIS with 4x4 sub-6 GHz elements, while the mmWave RIS demonstrates beam steering between -30 to 30 degrees for DBI-RIS with 8x8 mmWave elements, and have good agreement with simulation results.

Via

Access Paper or Ask Questions

Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

May 10, 2024

Yujie Zhang, Neil Gong, Michael K. Reiter

Figure 1 for Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

Figure 2 for Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

Figure 3 for Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

Figure 4 for Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

Abstract:Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data. Despite its privacy and scalability benefits, FL is susceptible to backdoor attacks, where adversaries poison the local training data of a subset of clients using a backdoor trigger, aiming to make the aggregated model produce malicious results when the same backdoor condition is met by an inference-time input. Existing backdoor attacks in FL suffer from common deficiencies: fixed trigger patterns and reliance on the assistance of model poisoning. State-of-the-art defenses based on Byzantine-robust aggregation exhibit a good defense performance on these attacks because of the significant divergence between malicious and benign model updates. To effectively conceal malicious model updates among benign ones, we propose DPOT, a backdoor attack strategy in FL that dynamically constructs backdoor objectives by optimizing a backdoor trigger, making backdoor data have minimal effect on model updates. We provide theoretical justifications for DPOT's attacking principle and display experimental results showing that DPOT, via only a data-poisoning attack, effectively undermines state-of-the-art defenses and outperforms existing backdoor attack techniques on various datasets.

Via

Access Paper or Ask Questions

Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment

Mar 27, 2024

Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Jenq-Neng Hwang, Xiaozhong Xu, Shan Liu

Abstract:No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference, which have achieved tremendous improvements due to the utilization of deep neural networks. However, learning-based NR-PCQA methods suffer from the scarcity of labeled data and usually perform suboptimally in terms of generalization. To solve the problem, we propose a novel contrastive pre-training framework tailored for PCQA (CoPA), which enables the pre-trained model to learn quality-aware representations from unlabeled data. To obtain anchors in the representation space, we project point clouds with different distortions into images and randomly mix their local patches to form mixed images with multiple distortions. Utilizing the generated anchors, we constrain the pre-training process via a quality-aware contrastive loss following the philosophy that perceptual quality is closely related to both content and distortion. Furthermore, in the model fine-tuning stage, we propose a semantic-guided multi-view fusion module to effectively integrate the features of projected images from multiple perspectives. Extensive experiments show that our method outperforms the state-of-the-art PCQA methods on popular benchmarks. Further investigations demonstrate that CoPA can also benefit existing learning-based PCQA models.

Via

Access Paper or Ask Questions

PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment

Mar 15, 2024

Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Shan Liu

Figure 1 for PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment

Figure 2 for PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment

Figure 3 for PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment

Figure 4 for PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment

Abstract:No-reference point cloud quality assessment (NR-PCQA) aims to automatically predict the perceptual quality of point clouds without reference, which has achieved remarkable performance due to the utilization of deep learning-based models. However, these data-driven models suffer from the scarcity of labeled data and perform unsatisfactorily in cross-dataset evaluations. To address this problem, we propose a self-supervised pre-training framework using masked autoencoders (PAME) to help the model learn useful representations without labels. Specifically, after projecting point clouds into images, our PAME employs dual-branch autoencoders, reconstructing masked patches from distorted images into the original patches within reference and distorted images. In this manner, the two branches can separately learn content-aware features and distortion-aware features from the projected images. Furthermore, in the model fine-tuning stage, the learned content-aware features serve as a guide to fuse the point cloud quality features extracted from different perspectives. Extensive experiments show that our method outperforms the state-of-the-art NR-PCQA methods on popular benchmarks in terms of prediction accuracy and generalizability.

Via

Access Paper or Ask Questions

Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Jul 04, 2023

Yipeng Liu, Qi Yang, Yujie Zhang, Yiling Xu, Le Yang, Xiaozhong Xu, Shan Liu

Figure 1 for Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Figure 2 for Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Figure 3 for Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Figure 4 for Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Abstract:Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, as reference point clouds are not available in many cases, no-reference (NR) metrics have become a research hotspot. Existing NR methods suffer from poor generalization performance. To address this shortcoming, we propose a novel NR-PCQA method, Point Cloud Quality Assessment via Domain-relevance Degradation Description (D$^3$-PCQA). First, we demonstrate our model's interpretability by deriving the function of each module using a kernelized ridge regression model. Specifically, quality assessment can be characterized as a leap from the scattered perceptual domain (reflecting subjective perception) to the ordered quality domain (reflecting mean opinion score). Second, to reduce the significant domain discrepancy, we establish an intermediate domain, the description domain, based on insights from subjective experiments, by considering the domain relevance among samples located in the perception domain and learning a structured latent space. The anchor features derived from the learned latent space are generated as cross-domain auxiliary information to promote domain transformation. Furthermore, the newly established description domain decomposes the NR-PCQA problem into two relevant stages. These stages include a classification stage that gives the degradation descriptions to point clouds and a regression stage to determine the confidence degrees of descriptions, providing a semantic explanation for the predicted quality scores. Experimental results demonstrate that D$^3$-PCQA exhibits robust performance and outstanding generalization ability on several publicly available datasets. The code in this work will be publicly available at https://smt.sjtu.edu.cn.

Via

Access Paper or Ask Questions