Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wen Shen

EfficientHuman: Efficient Training and Reconstruction of Moving Human using Articulated 2D Gaussian

Apr 29, 2025

Hao Tian, Rui Liu, Wen Shen, Yilong Hu, Zhihao Zheng, Xiaolin Qin

Abstract:3D Gaussian Splatting (3DGS) has been recognized as a pioneering technique in scene reconstruction and novel view synthesis. Recent work on reconstructing the 3D human body using 3DGS attempts to leverage prior information on human pose to enhance rendering quality and improve training speed. However, it struggles to effectively fit dynamic surface planes due to multi-view inconsistency and redundant Gaussians. This inconsistency arises because Gaussian ellipsoids cannot accurately represent the surfaces of dynamic objects, which hinders the rapid reconstruction of the dynamic human body. Meanwhile, the prevalence of redundant Gaussians means that the training time of these works is still not ideal for quickly fitting a dynamic human body. To address these, we propose EfficientHuman, a model that quickly accomplishes the dynamic reconstruction of the human body using Articulated 2D Gaussian while ensuring high rendering quality. The key innovation involves encoding Gaussian splats as Articulated 2D Gaussian surfels in canonical space and then transforming them to pose space via Linear Blend Skinning (LBS) to achieve efficient pose transformations. Unlike 3D Gaussians, Articulated 2D Gaussian surfels can quickly conform to the dynamic human body while ensuring view-consistent geometries. Additionally, we introduce a pose calibration module and an LBS optimization module to achieve precise fitting of dynamic human poses, enhancing the model's performance. Extensive experiments on the ZJU-MoCap dataset demonstrate that EfficientHuman achieves rapid 3D dynamic human reconstruction in less than a minute on average, which is 20 seconds faster than the current state-of-the-art method, while also reducing the number of redundant Gaussians.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions

SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI

Mar 25, 2025

Zhiyang Liu, Dong Yang, Minghao Zhang, Hanyu Sun, Hong Wu, Huiying Wang, Wen Shen, Chao Chai, Shuang Xia

Abstract:Despite that deep learning (DL) methods have presented tremendous potential in many medical image analysis tasks, the practical applications of medical DL models are limited due to the lack of enough data samples with manual annotations. By noting that the clinical radiology examinations are associated with radiology reports that describe the images, we propose to develop a foundation model for multi-model head MRI by using contrastive learning on the images and the corresponding radiology findings. In particular, a contrastive learning framework is proposed, where a mixed syntax and semantic similarity matching metric is integrated to reduce the thirst of extreme large dataset in conventional contrastive learning framework. Our proposed similarity enhanced contrastive language image pretraining (SeLIP) is able to effectively extract more useful features. Experiments revealed that our proposed SeLIP performs well in many downstream tasks including image-text retrieval task, classification task, and image segmentation, which highlights the importance of considering the similarities among texts describing different images in developing medical image foundation models.

Via

Access Paper or Ask Questions

M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment

Feb 21, 2025

Chuan Cui, Kejiang Chen, Zhihua Wei, Wen Shen, Weiming Zhang, Nenghai Yu

Abstract:The rapid advancement of AI-generated image (AGI) models has introduced significant challenges in evaluating their quality, which requires considering multiple dimensions such as perceptual quality, prompt correspondence, and authenticity. To address these challenges, we propose M3-AGIQA, a comprehensive framework for AGI quality assessment that is Multimodal, Multi-Round, and Multi-Aspect. Our approach leverages the capabilities of Multimodal Large Language Models (MLLMs) as joint text and image encoders and distills advanced captioning capabilities from online MLLMs into a local model via Low-Rank Adaptation (LoRA) fine-tuning. The framework includes a structured multi-round evaluation mechanism, where intermediate image descriptions are generated to provide deeper insights into the quality, correspondence, and authenticity aspects. To align predictions with human perceptual judgments, a predictor constructed by an xLSTM and a regression head is incorporated to process sequential logits and predict Mean Opinion Scores (MOSs). Extensive experiments conducted on multiple benchmark datasets demonstrate that M3-AGIQA achieves state-of-the-art performance, effectively capturing nuanced aspects of AGI quality. Furthermore, cross-dataset validation confirms its strong generalizability. The code is available at https://github.com/strawhatboy/M3-AGIQA.

* 14 pages, 5 figures. This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Where We Have Arrived in Proving the Emergence of Sparse Symbolic Concepts in AI Models

May 03, 2023

Qihan Ren, Jiayang Gao, Wen Shen, Quanshi Zhang

Abstract:This paper aims to prove the emergence of symbolic concepts in well-trained AI models. We prove that if (1) the high-order derivatives of the model output w.r.t. the input variables are all zero, (2) the AI model can be used on occluded samples and will yield higher confidence when the input sample is less occluded, and (3) the confidence of the AI model does not significantly degrade on occluded samples, then the AI model will encode sparse interactive concepts. Each interactive concept represents an interaction between a specific set of input variables, and has a certain numerical effect on the inference score of the model. Specifically, it is proved that the inference score of the model can always be represented as the sum of the interaction effects of all interactive concepts. In fact, we hope to prove that conditions for the emergence of symbolic concepts are quite common. It means that for most AI models, we can usually use a small number of interactive concepts to mimic the model outputs on any arbitrarily masked samples.

Via

Access Paper or Ask Questions

Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts?

Apr 03, 2023

Wen Shen, Lei Cheng, Yuxiao Yang, Mingjie Li, Quanshi Zhang

Figure 1 for Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts?

Figure 2 for Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts?

Figure 3 for Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts?

Figure 4 for Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts?

Abstract:In this paper, we explain the inference logic of large language models (LLMs) as a set of symbolic concepts. Many recent studies have discovered that traditional DNNs usually encode sparse symbolic concepts. However, because an LLM has much more parameters than traditional DNNs, whether the LLM also encodes sparse symbolic concepts is still an open problem. Therefore, in this paper, we propose to disentangle the inference score of LLMs for dialogue tasks into a small number of symbolic concepts. We verify that we can use those sparse concepts to well estimate all inference scores of the LLM on all arbitrarily masking states of the input sentence. We also evaluate the transferability of concepts encoded by an LLM and verify that symbolic concepts usually exhibit high transferability across similar input sentences. More crucially, those symbolic concepts can be used to explain the exact reasons accountable for the LLM's prediction errors.

Via

Access Paper or Ask Questions

Concept-Level Explanation for the Generalization of a DNN

Feb 25, 2023

Huilin Zhou, Hao Zhang, Huiqi Deng, Dongrui Liu, Wen Shen, Shih-Han Chan, Quanshi Zhang

Figure 1 for Concept-Level Explanation for the Generalization of a DNN

Figure 2 for Concept-Level Explanation for the Generalization of a DNN

Figure 3 for Concept-Level Explanation for the Generalization of a DNN

Figure 4 for Concept-Level Explanation for the Generalization of a DNN

Abstract:This paper explains the generalization power of a deep neural network (DNN) from the perspective of interactive concepts. Many recent studies have quantified a clear emergence of interactive concepts encoded by the DNN, which have been observed on different DNNs during the learning process. Therefore, in this paper, we investigate the generalization power of each interactive concept, and we use the generalization power of different interactive concepts to explain the generalization power of the entire DNN. Specifically, we define the complexity of each interactive concept. We find that simple concepts can be better generalized to testing data than complex concepts. The DNN with strong generalization power usually learns simple concepts more quickly and encodes fewer complex concepts. More crucially, we discover the detouring dynamics of learning complex concepts, which explain both the high learning difficulty and the low generalization power of complex concepts.

Via

Access Paper or Ask Questions

Defects of Convolutional Decoder Networks in Frequency Representation

Oct 17, 2022

Ling Tang, Wen Shen, Zhanpeng Zhou, Yuefeng Chen, Quanshi Zhang

Figure 1 for Defects of Convolutional Decoder Networks in Frequency Representation

Figure 2 for Defects of Convolutional Decoder Networks in Frequency Representation

Figure 3 for Defects of Convolutional Decoder Networks in Frequency Representation

Figure 4 for Defects of Convolutional Decoder Networks in Frequency Representation

Abstract:In this paper, we prove representation bottlenecks of a cascaded convolutional decoder network, considering the capacity of representing different frequency components of an input sample. We conduct the discrete Fourier transform on each channel of the feature map in an intermediate layer of the decoder network. Then, we introduce the rule of the forward propagation of such intermediate-layer spectrum maps, which is equivalent to the forward propagation of feature maps through a convolutional layer. Based on this, we find that each frequency component in the spectrum map is forward propagated independently with other frequency components. Furthermore, we prove two bottlenecks in representing feature spectrums. First, we prove that the convolution operation, the zero-padding operation, and a set of other settings all make a convolutional decoder network more likely to weaken high-frequency components. Second, we prove that the upsampling operation generates a feature spectrum, in which strong signals repetitively appears at certain frequencies.

Via

Access Paper or Ask Questions

Batch Normalization Is Blind to the First and Second Derivatives of the Loss

Jun 02, 2022

Zhanpeng Zhou, Wen Shen, Huixin Chen, Ling Tang, Quanshi Zhang

Figure 1 for Batch Normalization Is Blind to the First and Second Derivatives of the Loss

Figure 2 for Batch Normalization Is Blind to the First and Second Derivatives of the Loss

Figure 3 for Batch Normalization Is Blind to the First and Second Derivatives of the Loss

Figure 4 for Batch Normalization Is Blind to the First and Second Derivatives of the Loss

Abstract:In this paper, we prove the effects of the BN operation on the back-propagation of the first and second derivatives of the loss. When we do the Taylor series expansion of the loss function, we prove that the BN operation will block the influence of the first-order term and most influence of the second-order term of the loss. We also find that such a problem is caused by the standardization phase of the BN operation. Experimental results have verified our theoretical conclusions, and we have found that the BN operation significantly affects feature representations in specific tasks, where losses of different samples share similar analytic formulas.

Via

Access Paper or Ask Questions

Why Adversarial Training of ReLU Networks Is Difficult?

May 30, 2022

Xu Cheng, Hao Zhang, Yue Xin, Wen Shen, Jie Ren, Quanshi Zhang

Figure 1 for Why Adversarial Training of ReLU Networks Is Difficult?

Figure 2 for Why Adversarial Training of ReLU Networks Is Difficult?

Figure 3 for Why Adversarial Training of ReLU Networks Is Difficult?

Figure 4 for Why Adversarial Training of ReLU Networks Is Difficult?

Abstract:This paper mathematically derives an analytic solution of the adversarial perturbation on a ReLU network, and theoretically explains the difficulty of adversarial training. Specifically, we formulate the dynamics of the adversarial perturbation generated by the multi-step attack, which shows that the adversarial perturbation tends to strengthen eigenvectors corresponding to a few top-ranked eigenvalues of the Hessian matrix of the loss w.r.t. the input. We also prove that adversarial training tends to strengthen the influence of unconfident input samples with large gradient norms in an exponential manner. Besides, we find that adversarial training strengthens the influence of the Hessian matrix of the loss w.r.t. network parameters, which makes the adversarial training more likely to oscillate along directions of a few samples, and boosts the difficulty of adversarial training. Crucially, our proofs provide a unified explanation for previous findings in understanding adversarial training.

Via

Access Paper or Ask Questions

Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Nov 05, 2021

Wen Shen, Qihan Ren, Dongrui Liu, Quanshi Zhang

Figure 1 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Figure 2 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Figure 3 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Figure 4 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Abstract:In this paper, we evaluate the quality of knowledge representations encoded in deep neural networks (DNNs) for 3D point cloud processing. We propose a method to disentangle the overall model vulnerability into the sensitivity to the rotation, the translation, the scale, and local 3D structures. Besides, we also propose metrics to evaluate the spatial smoothness of encoding 3D structures, and the representation complexity of the DNN. Based on such analysis, experiments expose representation problems with classic DNNs, and explain the utility of the adversarial training.

Via

Access Paper or Ask Questions