Abstract:In this study, we explore the essential challenge of fast scene optimization for Gaussian Splatting. Through a thorough analysis of the geometry modeling process, we reveal that dense point clouds can be effectively reconstructed early in optimization through Gaussian representations. This insight leads to our approach of aggressive Gaussian densification, which provides a more efficient alternative to conventional progressive densification methods. By significantly increasing the number of critical Gaussians, we enhance the model capacity to capture dense scene geometry at the early stage of optimization. This strategy is seamlessly integrated into the Mini-Splatting densification and simplification framework, enabling rapid convergence without compromising quality. Additionally, we introduce visibility culling within Gaussian Splatting, leveraging per-view Gaussian importance as precomputed visibility to accelerate the optimization process. Our Mini-Splatting2 achieves a balanced trade-off among optimization time, the number of Gaussians, and rendering quality, establishing a strong baseline for future Gaussian-Splatting-based works. Our work sets the stage for more efficient, high-quality 3D scene modeling in real-world applications, and the code will be made available no matter acceptance.
Abstract:The core of self-supervised point cloud learning lies in setting up appropriate pretext tasks, to construct a pre-training framework that enables the encoder to perceive 3D objects effectively. In this paper, we integrate two prevalent methods, masked point modeling (MPM) and 3D-to-2D generation, as pretext tasks within a pre-training framework. We leverage the spatial awareness and precise supervision offered by these two methods to address their respective limitations: ambiguous supervision signals and insensitivity to geometric information. Specifically, the proposed framework, abbreviated as PointCG, consists of a Hidden Point Completion (HPC) module and an Arbitrary-view Image Generation (AIG) module. We first capture visible points from arbitrary views as inputs by removing hidden points. Then, HPC extracts representations of the inputs with an encoder and completes the entire shape with a decoder, while AIG is used to generate rendered images based on the visible points' representations. Extensive experiments demonstrate the superiority of the proposed method over the baselines in various downstream tasks. Our code will be made available upon acceptance.
Abstract:This study presents a novel approach for quantificationally reconstructing density fields from shadowgraph images using physics-informed neural networks
Abstract:In the field of materials science, exploring the relationship between composition, microstructure, and properties has long been a critical research focus. The mechanical performance of solid-solution Mg-Gd alloys is significantly influenced by Gd content, dendritic structures, and the presence of secondary phases. To better analyze and predict the impact of these factors, this study proposes a multimodal fusion learning framework based on image processing and deep learning techniques. This framework integrates both elemental composition and microstructural features to accurately predict the Vickers hardness of solid-solution Mg-Gd alloys. Initially, deep learning methods were employed to extract microstructural information from a variety of solid-solution Mg-Gd alloy images obtained from literature and experiments. This provided precise grain size and secondary phase microstructural features for performance prediction tasks. Subsequently, these quantitative analysis results were combined with Gd content information to construct a performance prediction dataset. Finally, a regression model based on the Transformer architecture was used to predict the Vickers hardness of Mg-Gd alloys. The experimental results indicate that the Transformer model performs best in terms of prediction accuracy, achieving an R^2 value of 0.9. Additionally, SHAP analysis identified critical values for four key features affecting the Vickers hardness of Mg-Gd alloys, providing valuable guidance for alloy design. These findings not only enhance the understanding of alloy performance but also offer theoretical support for future material design and optimization.
Abstract:This study presents a novel approach to reconstructing density fields from shadowgraph images using a physics-informed framework. By integrating traditional shadowgraph imaging techniques with physics-informed neural networks (PINNs), we effectively capture refractive index variations within complex flow fields. The proposed method addresses the inherent challenges of shadowgraphy, such as noise and limited spatial resolution, enabling accurate visualization of fluid dynamics. Experimental results demonstrate the feasibility and robustness of our approach, with significant agreement observed between the reconstructed density fields and experimental measurements. This research contributes to the advancement of non-intrusive diagnostic techniques in fluid mechanics and enhances our understanding of flow structures in various applications.
Abstract:With the rapid development of Large language models (LLMs), understanding the capabilities of LLMs in identifying unsafe content has become increasingly important. While previous works have introduced several benchmarks to evaluate the safety risk of LLMs, the community still has a limited understanding of current LLMs' capability to recognize illegal and unsafe content in Chinese contexts. In this work, we present a Chinese safety benchmark (ChineseSafe) to facilitate research on the content safety of large language models. To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography, and variant/homophonic words. Moreover, we employ two methods to evaluate the legal risks of popular LLMs, including open-sourced models and APIs. The results reveal that many LLMs exhibit vulnerability to certain types of safety issues, leading to legal risks in China. Our work provides a guideline for developers and researchers to facilitate the safety of LLMs. Our results are also available at https://huggingface.co/spaces/SUSTech/ChineseSafe-Benchmark.
Abstract:Point cloud registration is a foundational task for 3D alignment and reconstruction applications. While both traditional and learning-based registration approaches have succeeded, leveraging the intrinsic symmetry of point cloud data, including rotation equivariance, has received insufficient attention. This prohibits the model from learning effectively, resulting in a requirement for more training data and increased model complexity. To address these challenges, we propose a graph neural network model embedded with a local Spherical Euclidean 3D equivariance property through SE(3) message passing based propagation. Our model is composed mainly of a descriptor module, equivariant graph layers, match similarity, and the final regression layers. Such modular design enables us to utilize sparsely sampled input points and initialize the descriptor by self-trained or pre-trained geometric feature descriptors easily. Experiments conducted on the 3DMatch and KITTI datasets exhibit the compelling and robust performance of our model compared to state-of-the-art approaches, while the model complexity remains relatively low at the same time.
Abstract:Bilevel optimization problems comprise an upper level optimization task that contains a lower level optimization task as a constraint. While there is a significant and growing literature devoted to solving bilevel problems with single objective at both levels using evolutionary computation, there is relatively scarce work done to address problems with multiple objectives (BLMOP) at both levels. For black-box BLMOPs, the existing evolutionary techniques typically utilize nested search, which in its native form consumes large number of function evaluations. In this work, we propose to reduce this expense by predicting the lower level Pareto set for a candidate upper level solution directly, instead of conducting an optimization from scratch. Such a prediction is significantly challenging for BLMOPs as it involves one-to-many mapping scenario. We resolve this bottleneck by supplementing the dataset using a helper variable and construct a neural network, which can then be trained to map the variables in a meaningful manner. Then, we embed this initialization within a bilevel optimization framework, termed Pareto set prediction assisted evolutionary bilevel multi-objective optimization (PSP-BLEMO). Systematic experiments with existing state-of-the-art methods are presented to demonstrate its benefit. The experiments show that the proposed approach is competitive across a range of problems, including both deceptive and non-deceptive problems
Abstract:The goal of image style transfer is to render an image guided by a style reference while maintaining the original content. Existing image-guided methods rely on specific style reference images, restricting their wider application and potentially compromising result quality. As a flexible alternative, text-guided methods allow users to describe the desired style using text prompts. Despite their versatility, these methods often struggle with maintaining style consistency, reflecting the described style accurately, and preserving the content of the target image. To address these challenges, we introduce FAGStyle, a zero-shot text-guided diffusion image style transfer method. Our approach enhances inter-patch information interaction by incorporating the Sliding Window Crop technique and Feature Augmentation on Geodesic Surface into our style control loss. Furthermore, we integrate a Pre-Shape self-correlation consistency loss to ensure content consistency. FAGStyle demonstrates superior performance over existing methods, consistently achieving stylization that retains the semantic content of the source image. Experimental results confirms the efficacy of FAGStyle across a diverse range of source contents and styles, both imagined and common.
Abstract:Nowadays, misinformation is widely spreading over various social media platforms and causes extremely negative impacts on society. To combat this issue, automatically identifying misinformation, especially those containing multimodal content, has attracted growing attention from the academic and industrial communities, and induced an active research topic named Multimodal Misinformation Detection (MMD). Typically, existing MMD methods capture the semantic correlation and inconsistency between multiple modalities, but neglect some potential clues in multimodal content. Recent studies suggest that manipulated traces of the images in articles are non-trivial clues for detecting misinformation. Meanwhile, we find that the underlying intentions behind the manipulation, e.g., harmful and harmless, also matter in MMD. Accordingly, in this work, we propose to detect misinformation by learning manipulation features that indicate whether the image has been manipulated, as well as intention features regarding the harmful and harmless intentions of the manipulation. Unfortunately, the manipulation and intention labels that make these features discriminative are unknown. To overcome the problem, we propose two weakly supervised signals as alternatives by introducing additional datasets on image manipulation detection and formulating two classification tasks as positive and unlabeled learning problems. Based on these ideas, we propose a novel MMD method, namely Harmfully Manipulated Images Matter in MMD (HAMI-M3D). Extensive experiments across three benchmark datasets can demonstrate that HAMI-M3D can consistently improve the performance of any MMD baselines.