Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhenzhong Wang

NexusSplats: Efficient 3D Gaussian Splatting in the Wild

Nov 26, 2024

Yuzhou Tang, Dejun Xu, Yongjie Hou, Zhenzhong Wang, Min Jiang

Abstract:While 3D Gaussian Splatting (3DGS) has recently demonstrated remarkable rendering quality and efficiency in 3D scene reconstruction, it struggles with varying lighting conditions and incidental occlusions in real-world scenarios. To accommodate varying lighting conditions, existing 3DGS extensions apply color mapping to the massive Gaussian primitives with individually optimized appearance embeddings. To handle occlusions, they predict pixel-wise uncertainties via 2D image features for occlusion capture. Nevertheless, such massive color mapping and pixel-wise uncertainty prediction strategies suffer from not only additional computational costs but also coarse-grained lighting and occlusion handling. In this work, we propose a nexus kernel-driven approach, termed NexusSplats, for efficient and finer 3D scene reconstruction under complex lighting and occlusion conditions. In particular, NexusSplats leverages a novel light decoupling strategy where appearance embeddings are optimized based on nexus kernels instead of massive Gaussian primitives, thus accelerating reconstruction speeds while ensuring local color consistency for finer textures. Additionally, a Gaussian-wise uncertainty mechanism is developed, aligning 3D structures with 2D image features for fine-grained occlusion handling. Experimental results demonstrate that NexusSplats achieves state-of-the-art rendering quality while reducing reconstruction time by up to 70.4% compared to the current best in quality.

* error data in the v1 version

Via

Access Paper or Ask Questions

Phys4DGen: A Physics-Driven Framework for Controllable and Efficient 4D Content Generation from a Single Image

Nov 25, 2024

Jiajing Lin, Zhenzhong Wang, Shu Jiang, Yongjie Hou, Min Jiang

Abstract:The task of 4D content generation involves creating dynamic 3D models that evolve over time in response to specific input conditions, such as images. Existing methods rely heavily on pre-trained video diffusion models to guide 4D content dynamics, but these approaches often fail to capture essential physical principles, as video diffusion models lack a robust understanding of real-world physics. Moreover, these models face challenges in providing fine-grained control over dynamics and exhibit high computational costs. In this work, we propose Phys4DGen, a novel, high-efficiency framework that generates physics-compliant 4D content from a single image with enhanced control capabilities. Our approach uniquely integrates physical simulations into the 4D generation pipeline, ensuring adherence to fundamental physical laws. Inspired by the human ability to infer physical properties visually, we introduce a Physical Perception Module (PPM) that discerns the material properties and structural components of the 3D object from the input image, facilitating accurate downstream simulations. Phys4DGen significantly accelerates the 4D generation process by eliminating iterative optimization steps in the dynamics modeling phase. It allows users to intuitively control the movement speed and direction of generated 4D content by adjusting external forces, achieving finely tunable, physically plausible animations. Extensive evaluations show that Phys4DGen outperforms existing methods in both inference speed and physical realism, producing high-quality, controllable 4D content.

Via

Access Paper or Ask Questions

Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform Optimization

Sep 26, 2024

Yunpeng Gong, Qingyuan Zeng, Dejun Xu, Zhenzhong Wang, Min Jiang

Figure 1 for Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform Optimization

Figure 2 for Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform Optimization

Figure 3 for Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform Optimization

Figure 4 for Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform Optimization

Abstract:In recent years, despite significant advancements in adversarial attack research, the security challenges in cross-modal scenarios, such as the transferability of adversarial attacks between infrared, thermal, and RGB images, have been overlooked. These heterogeneous image modalities collected by different hardware devices are widely prevalent in practical applications, and the substantial differences between modalities pose significant challenges to attack transferability. In this work, we explore a novel cross-modal adversarial attack strategy, termed multiform attack. We propose a dual-layer optimization framework based on gradient-evolution, facilitating efficient perturbation transfer between modalities. In the first layer of optimization, the framework utilizes image gradients to learn universal perturbations within each modality and employs evolutionary algorithms to search for shared perturbations with transferability across different modalities through secondary optimization. Through extensive testing on multiple heterogeneous datasets, we demonstrate the superiority and robustness of Multiform Attack compared to existing techniques. This work not only enhances the transferability of cross-modal adversarial attacks but also provides a new perspective for understanding security vulnerabilities in cross-modal systems.

Via

Access Paper or Ask Questions

Phy124: Fast Physics-Driven 4D Content Generation from a Single Image

Sep 11, 2024

Jiajing Lin, Zhenzhong Wang, Yongjie Hou, Yuzhou Tang, Min Jiang

Abstract:4D content generation focuses on creating dynamic 3D objects that change over time. Existing methods primarily rely on pre-trained video diffusion models, utilizing sampling processes or reference videos. However, these approaches face significant challenges. Firstly, the generated 4D content often fails to adhere to real-world physics since video diffusion models do not incorporate physical priors. Secondly, the extensive sampling process and the large number of parameters in diffusion models result in exceedingly time-consuming generation processes. To address these issues, we introduce Phy124, a novel, fast, and physics-driven method for controllable 4D content generation from a single image. Phy124 integrates physical simulation directly into the 4D generation process, ensuring that the resulting 4D content adheres to natural physical laws. Phy124 also eliminates the use of diffusion models during the 4D dynamics generation phase, significantly speeding up the process. Phy124 allows for the control of 4D dynamics, including movement speed and direction, by manipulating external forces. Extensive experiments demonstrate that Phy124 generates high-fidelity 4D content with significantly reduced inference times, achieving stateof-the-art performance. The code and generated 4D content are available at the provided link: https://anonymous.4open.science/r/BBF2/.

Via

Access Paper or Ask Questions

Adversarial Learning for Neural PDE Solvers with Sparse Data

Sep 04, 2024

Yunpeng Gong, Yongjie Hou, Zhenzhong Wang, Zexin Lin, Min Jiang

Figure 1 for Adversarial Learning for Neural PDE Solvers with Sparse Data

Figure 2 for Adversarial Learning for Neural PDE Solvers with Sparse Data

Figure 3 for Adversarial Learning for Neural PDE Solvers with Sparse Data

Figure 4 for Adversarial Learning for Neural PDE Solvers with Sparse Data

Abstract:Neural network solvers for partial differential equations (PDEs) have made significant progress, yet they continue to face challenges related to data scarcity and model robustness. Traditional data augmentation methods, which leverage symmetry or invariance, impose strong assumptions on physical systems that often do not hold in dynamic and complex real-world applications. To address this research gap, this study introduces a universal learning strategy for neural network PDEs, named Systematic Model Augmentation for Robust Training (SMART). By focusing on challenging and improving the model's weaknesses, SMART reduces generalization error during training under data-scarce conditions, leading to significant improvements in prediction accuracy across various PDE scenarios. The effectiveness of the proposed method is demonstrated through both theoretical analysis and extensive experimentation. The code will be available.

Via

Access Paper or Ask Questions

Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

Aug 16, 2024

Qingyuan Zeng, Zhenzhong Wang, Yiu-ming Cheung, Min Jiang

Figure 1 for Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

Figure 2 for Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

Figure 3 for Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

Figure 4 for Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

Abstract:While image-to-text models have demonstrated significant advancements in various vision-language tasks, they remain susceptible to adversarial attacks. Existing white-box attacks on image-to-text models require access to the architecture, gradients, and parameters of the target model, resulting in low practicality. Although the recently proposed gray-box attacks have improved practicality, they suffer from semantic loss during the training process, which limits their targeted attack performance. To advance adversarial attacks of image-to-text models, this paper focuses on a challenging scenario: decision-based black-box targeted attacks where the attackers only have access to the final output text and aim to perform targeted attacks. Specifically, we formulate the decision-based black-box targeted attack as a large-scale optimization problem. To efficiently solve the optimization problem, a three-stage process \textit{Ask, Attend, Attack}, called \textit{AAA}, is proposed to coordinate with the solver. \textit{Ask} guides attackers to create target texts that satisfy the specific semantics. \textit{Attend} identifies the crucial regions of the image for attacking, thus reducing the search space for the subsequent \textit{Attack}. \textit{Attack} uses an evolutionary algorithm to attack the crucial regions, where the attacks are semantically related to the target texts of \textit{Ask}, thus achieving targeted attacks without semantic loss. Experimental results on transformer-based and CNN+RNN-based image-to-text models confirmed the effectiveness of our proposed \textit{AAA}.

Via

Access Paper or Ask Questions

Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models

May 25, 2024

Zhenzhong Wang, Zehui Lin, Wanyu Lin, Ming Yang, Minggang Zeng, Kay Chen Tan

Abstract:Providing explainable molecule property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop a new framework for explainable molecular property prediction based on language models, dubbed as Lamole, which can provide chemical concepts-aligned explanations. We first leverage a designated molecular representation -- the Group SELFIES -- as it can provide chemically meaningful semantics. Because attention mechanisms in Transformers can inherently capture relationships within the input, we further incorporate the attention weights and gradients together to generate explanations for capturing the functional group interactions. We then carefully craft a marginal loss to explicitly optimize the explanations to be able to align with the chemists' annotations. We bridge the manifold hypothesis with the elaborated marginal loss to prove that the loss can align the explanations with the tangent space of the data manifold, leading to concept-aligned explanations. Experimental results over six mutagenicity datasets and one hepatotoxicity dataset demonstrate Lamole can achieve comparable classification accuracy and boost the explanation accuracy by up to 14.8%, being the state-of-the-art in explainable molecular property prediction.

Via

Access Paper or Ask Questions

Multi-View Subgraph Neural Networks: Self-Supervised Learning with Scarce Labeled Data

Apr 19, 2024

Zhenzhong Wang, Qingyuan Zeng, Wanyu Lin, Min Jiang, Kay Chen Tan

Abstract:While graph neural networks (GNNs) have become the de-facto standard for graph-based node classification, they impose a strong assumption on the availability of sufficient labeled samples. This assumption restricts the classification performance of prevailing GNNs on many real-world applications suffering from low-data regimes. Specifically, features extracted from scarce labeled nodes could not provide sufficient supervision for the unlabeled samples, leading to severe over-fitting. In this work, we point out that leveraging subgraphs to capture long-range dependencies can augment the representation of a node with homophily properties, thus alleviating the low-data regime. However, prior works leveraging subgraphs fail to capture the long-range dependencies among nodes. To this end, we present a novel self-supervised learning framework, called multi-view subgraph neural networks (Muse), for handling long-range dependencies. In particular, we propose an information theory-based identification mechanism to identify two types of subgraphs from the views of input space and latent space, respectively. The former is to capture the local structure of the graph, while the latter captures the long-range dependencies among nodes. By fusing these two views of subgraphs, the learned representations can preserve the topological properties of the graph at large, including the local structure and long-range dependencies, thus maximizing their expressiveness for downstream node classification tasks. Experimental results show that Muse outperforms the alternative methods on node classification tasks with limited labeled data.

Via

Access Paper or Ask Questions

Manifold Interpolation for Large-Scale Multi-Objective Optimization via Generative Adversarial Networks

Jan 08, 2021

Zhenzhong Wang, Haokai Hong, Kai Ye, Min Jiang, Kay Chen Tan

Figure 1 for Manifold Interpolation for Large-Scale Multi-Objective Optimization via Generative Adversarial Networks

Figure 2 for Manifold Interpolation for Large-Scale Multi-Objective Optimization via Generative Adversarial Networks

Figure 3 for Manifold Interpolation for Large-Scale Multi-Objective Optimization via Generative Adversarial Networks

Figure 4 for Manifold Interpolation for Large-Scale Multi-Objective Optimization via Generative Adversarial Networks

Abstract:Large-scale multiobjective optimization problems (LSMOPs) are characterized as involving hundreds or even thousands of decision variables and multiple conflicting objectives. An excellent algorithm for solving LSMOPs should find Pareto-optimal solutions with diversity and escape from local optima in the large-scale search space. Previous research has shown that these optimal solutions are uniformly distributed on the manifold structure in the low-dimensional space. However, traditional evolutionary algorithms for solving LSMOPs have some deficiencies in dealing with this structural manifold, resulting in poor diversity, local optima, and inefficient searches. In this work, a generative adversarial network (GAN)-based manifold interpolation framework is proposed to learn the manifold and generate high-quality solutions on this manifold, thereby improving the performance of evolutionary algorithms. We compare the proposed algorithm with several state-of-the-art algorithms on large-scale multiobjective benchmark functions. Experimental results have demonstrated the significant improvements achieved by this framework in solving LSMOPs.

Via

Access Paper or Ask Questions

Evolutionary Dynamic Multi-objective Optimization Via Regression Transfer Learning

Oct 22, 2019

Zhenzhong Wang, Min Jiang, Xing Gao, Liang Feng, Weizhen Hu, Kay Chen Tan

Figure 1 for Evolutionary Dynamic Multi-objective Optimization Via Regression Transfer Learning

Figure 2 for Evolutionary Dynamic Multi-objective Optimization Via Regression Transfer Learning

Figure 3 for Evolutionary Dynamic Multi-objective Optimization Via Regression Transfer Learning

Figure 4 for Evolutionary Dynamic Multi-objective Optimization Via Regression Transfer Learning

Abstract:Dynamic multi-objective optimization problems (DMOPs) remain a challenge to be settled, because of conflicting objective functions change over time. In recent years, transfer learning has been proven to be a kind of effective approach in solving DMOPs. In this paper, a novel transfer learning based dynamic multi-objective optimization algorithm (DMOA) is proposed called regression transfer learning prediction based DMOA (RTLP-DMOA). The algorithm aims to generate an excellent initial population to accelerate the evolutionary process and improve the evolutionary performance in solving DMOPs. When an environmental change is detected, a regression transfer learning prediction model is constructed by reusing the historical population, which can predict objective values. Then, with the assistance of this prediction model, some high-quality solutions with better predicted objective values are selected as the initial population, which can improve the performance of the evolutionary process. We compare the proposed algorithm with three state-of-the-art algorithms on benchmark functions. Experimental results indicate that the proposed algorithm can significantly enhance the performance of static multi-objective optimization algorithms and is competitive in convergence and diversity.

Via

Access Paper or Ask Questions