Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhang Chen

PanoDreamer: Consistent Text to 360-Degree Scene Generation

Apr 07, 2025

Zhexiao Xiong, Zhang Chen, Zhong Li, Yi Xu, Nathan Jacobs

Abstract:Automatically generating a complete 3D scene from a text description, a reference image, or both has significant applications in fields like virtual reality and gaming. However, current methods often generate low-quality textures and inconsistent 3D structures. This is especially true when extrapolating significantly beyond the field of view of the reference image. To address these challenges, we propose PanoDreamer, a novel framework for consistent, 3D scene generation with flexible text and image control. Our approach employs a large language model and a warp-refine pipeline, first generating an initial set of images and then compositing them into a 360-degree panorama. This panorama is then lifted into 3D to form an initial point cloud. We then use several approaches to generate additional images, from different viewpoints, that are consistent with the initial point cloud and expand/refine the initial point cloud. Given the resulting set of images, we utilize 3D Gaussian Splatting to create the final 3D scene, which can then be rendered from different viewpoints. Experiments demonstrate the effectiveness of PanoDreamer in generating high-quality, geometrically consistent 3D scenes.

* Accepted by CVPR 2025 Workshop on Computer Vision for Metaverse

Via

Access Paper or Ask Questions

RBFIM: Perceptual Quality Assessment for Compressed Point Clouds Using Radial Basis Function Interpolation

Mar 18, 2025

Zhang Chen, Shuai Wan, Siyu Ren, Fuzheng Yang, Mengting Yu, Junhui Hou

Abstract:One of the main challenges in point cloud compression (PCC) is how to evaluate the perceived distortion so that the codec can be optimized for perceptual quality. Current standard practices in PCC highlight a primary issue: while single-feature metrics are widely used to assess compression distortion, the classic method of searching point-to-point nearest neighbors frequently fails to adequately build precise correspondences between point clouds, resulting in an ineffective capture of human perceptual features. To overcome the related limitations, we propose a novel assessment method called RBFIM, utilizing radial basis function (RBF) interpolation to convert discrete point features into a continuous feature function for the distorted point cloud. By substituting the geometry coordinates of the original point cloud into the feature function, we obtain the bijective sets of point features. This enables an establishment of precise corresponding features between distorted and original point clouds and significantly improves the accuracy of quality assessments. Moreover, this method avoids the complexity caused by bidirectional searches. Extensive experiments on multiple subjective quality datasets of compressed point clouds demonstrate that our RBFIM excels in addressing human perception tasks, thereby providing robust support for PCC optimization efforts.

Via

Access Paper or Ask Questions

InfinitePOD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers

Feb 07, 2025

Chenchen Shou, Guyue Liu, Hao Nie, Huaiyu Meng, Yu Zhou, Yimin Jiang, Wenqing Lv, Yelong Xu, Yuanwei Lu, Zhang Chen(+4 more)

Abstract:Scaling Large Language Model (LLM) training relies on multi-dimensional parallelism, where High-Bandwidth Domains (HBDs) are critical for communication-intensive parallelism like Tensor Parallelism (TP) and Expert Parallelism (EP). However, existing HBD architectures face fundamental limitations in scalability, cost, and fault resiliency: switch-centric HBDs (e.g., NVL-72) incur prohibitive scaling costs, while GPU-centric HBDs (e.g., TPUv3/Dojo) suffer from severe fault propagation. Switch-GPU hybrid HBDs such as TPUv4 takes a middle-ground approach by leveraging Optical Circuit Switches, but the fault explosion radius remains large at the cube level (e.g., 64 TPUs). We propose InfinitePOD, a novel transceiver-centric HBD architecture that unifies connectivity and dynamic switching at the transceiver level using Optical Circuit Switching (OCS). By embedding OCS within each transceiver, InfinitePOD achieves reconfigurable point-to-multipoint connectivity, allowing the topology to adapt into variable-size rings. This design provides: i) datacenter-wide scalability without cost explosion; ii) fault resilience by isolating failures to a single node, and iii) full bandwidth utilization for fault-free GPUs. Key innovations include a Silicon Photonic (SiPh) based low-cost OCS transceiver (OCSTrx), a reconfigurable k-hop ring topology co-designed with intra-/inter-node communication, and an HBD-DCN orchestration algorithm maximizing GPU utilization while minimizing cross-ToR datacenter network traffic. The evaluation demonstrates that InfinitePOD achieves 31% of the cost of NVL-72, near-zero GPU waste ratio (over one order of magnitude lower than NVL-72 and TPUv4), near-zero cross-ToR traffic when node fault ratios under 7%, and improves Model FLOPs Utilization by 3.37x compared to NVIDIA DGX (8 GPUs per Node).

Via

Access Paper or Ask Questions

SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction

Dec 19, 2024

Zhuowen Shen, Yuan Liu, Zhang Chen, Zhong Li, Jiepeng Wang, Yongqing Liang, Zhengming Yu, Jingdong Zhang, Yi Xu, Scott Schaefer(+2 more)

Figure 1 for SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction

Figure 2 for SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction

Figure 3 for SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction

Figure 4 for SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction

Abstract:Gaussian splatting has achieved impressive improvements for both novel-view synthesis and surface reconstruction from multi-view images. However, current methods still struggle to reconstruct high-quality surfaces from only sparse view input images using Gaussian splatting. In this paper, we propose a novel method called SolidGS to address this problem. We observed that the reconstructed geometry can be severely inconsistent across multi-views, due to the property of Gaussian function in geometry rendering. This motivates us to consolidate all Gaussians by adopting a more solid kernel function, which effectively improves the surface reconstruction quality. With the additional help of geometrical regularization and monocular normal estimation, our method achieves superior performance on the sparse view surface reconstruction than all the Gaussian splatting methods and neural field methods on the widely used DTU, Tanks-and-Temples, and LLFF datasets.

* Project page: https://mickshen7558.github.io/projects/SolidGS/

Via

Access Paper or Ask Questions

SparseLGS: Sparse View Language Embedded Gaussian Splatting

Dec 03, 2024

Jun Hu, Zhang Chen, Zhong Li, Yi Xu, Juyong Zhang

Figure 1 for SparseLGS: Sparse View Language Embedded Gaussian Splatting

Figure 2 for SparseLGS: Sparse View Language Embedded Gaussian Splatting

Figure 3 for SparseLGS: Sparse View Language Embedded Gaussian Splatting

Figure 4 for SparseLGS: Sparse View Language Embedded Gaussian Splatting

Abstract:Recently, several studies have combined Gaussian Splatting to obtain scene representations with language embeddings for open-vocabulary 3D scene understanding. While these methods perform well, they essentially require very dense multi-view inputs, limiting their applicability in real-world scenarios. In this work, we propose SparseLGS to address the challenge of 3D scene understanding with pose-free and sparse view input images. Our method leverages a learning-based dense stereo model to handle pose-free and sparse inputs, and a three-step region matching approach to address the multi-view semantic inconsistency problem, which is especially important for sparse inputs. Different from directly learning high-dimensional CLIP features, we extract low-dimensional information and build bijections to avoid excessive learning and storage costs. We introduce a reconstruction loss during semantic training to improve Gaussian positions and shapes. To the best of our knowledge, we are the first to address the 3D semantic field problem with sparse pose-free inputs. Experimental results show that SparseLGS achieves comparable quality when reconstructing semantic fields with fewer inputs (3-4 views) compared to previous SOTA methods with dense input. Besides, when using the same sparse input, SparseLGS leads significantly in quality and heavily improves the computation speed (5$\times$ speedup). Project page: {\tt\small \url{https://ustc3dv.github.io/SparseLGS}}

* Project Page: https://ustc3dv.github.io/SparseLGS

Via

Access Paper or Ask Questions

Equilibrium Adaptation-Based Control for Track Stand of Single-Track Two-Wheeled Robots

Oct 25, 2024

Boyi Wang, Yang Deng, Feilong Jing, Yiyong Sun, Zhang Chen, Bin Liang

Figure 1 for Equilibrium Adaptation-Based Control for Track Stand of Single-Track Two-Wheeled Robots

Figure 2 for Equilibrium Adaptation-Based Control for Track Stand of Single-Track Two-Wheeled Robots

Figure 3 for Equilibrium Adaptation-Based Control for Track Stand of Single-Track Two-Wheeled Robots

Figure 4 for Equilibrium Adaptation-Based Control for Track Stand of Single-Track Two-Wheeled Robots

Abstract:Stationary balance control is challenging for single-track two-wheeled (STTW) robots due to the lack of elegant balancing mechanisms and the conflict between the limited attraction domain and external disturbances. To address the absence of balancing mechanisms, we draw inspiration from cyclists and leverage the track stand maneuver, which relies solely on steering and rear-wheel actuation. To achieve accurate tracking in the presence of matched and mismatched disturbances, we propose an equilibrium adaptation-based control (EABC) scheme that can be seamlessly integrated with standard disturbance observers and controllers. This scheme enables adaptation to slow-varying disturbances by utilizing a disturbed equilibrium estimator, effectively handling both matched and mismatched disturbances in a unified manner while ensuring accurate tracking with zero steady-state error. We integrate the EABC scheme with nonlinear model predictive control (MPC) for the track stand of STTW robots and validate its effectiveness through two experimental scenarios. Our method demonstrates significant improvements in tracking accuracy, reducing errors by several orders of magnitude.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

Aug 04, 2024

Aoming Liu, Zhong Li, Zhang Chen, Nannan Li, Yi Xu, Bryan A. Plummer

Figure 1 for PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

Figure 2 for PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

Figure 3 for PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

Figure 4 for PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

Abstract:Immersive scene generation, notably panorama creation, benefits significantly from the adaptation of large pre-trained text-to-image (T2I) models for multi-view image generation. Due to the high cost of acquiring multi-view images, tuning-free generation is preferred. However, existing methods are either limited to simple correspondences or require extensive fine-tuning to capture complex ones. We present PanoFree, a novel method for tuning-free multi-view image generation that supports an extensive array of correspondences. PanoFree sequentially generates multi-view images using iterative warping and inpainting, addressing the key issues of inconsistency and artifacts from error accumulation without the need for fine-tuning. It improves error accumulation by enhancing cross-view awareness and refines the warping and inpainting processes via cross-view guidance, risky area estimation and erasing, and symmetric bidirectional guided generation for loop closure, alongside guidance-based semantic and density control for scene structure preservation. In experiments on Planar, 360{\deg}, and Full Spherical Panoramas, PanoFree demonstrates significant error reduction, improves global consistency, and boosts image quality without extra fine-tuning. Compared to existing methods, PanoFree is up to 5x more efficient in time and 3x more efficient in GPU memory usage, and maintains superior diversity of results (2x better in our user study). PanoFree offers a viable alternative to costly fine-tuning or the use of additional pre-trained models. Project website at https://panofree.github.io/.

* Accepted by ECCV 2024

Via

Access Paper or Ask Questions

Multi-Agent Causal Discovery Using Large Language Models

Jul 21, 2024

Hao Duong Le, Xin Xia, Zhang Chen

Figure 1 for Multi-Agent Causal Discovery Using Large Language Models

Figure 2 for Multi-Agent Causal Discovery Using Large Language Models

Figure 3 for Multi-Agent Causal Discovery Using Large Language Models

Figure 4 for Multi-Agent Causal Discovery Using Large Language Models

Abstract:Large Language Models (LLMs) have demonstrated significant potential in causal discovery tasks by utilizing their vast expert knowledge from extensive text corpora. However, the multi-agent capabilities of LLMs in causal discovery remain underexplored. This paper introduces a general framework to investigate this potential. The first is the Meta Agents Model, which relies exclusively on reasoning and discussions among LLM agents to conduct causal discovery. The second is the Coding Agents Model, which leverages the agents' ability to plan, write, and execute code, utilizing advanced statistical libraries for causal discovery. The third is the Hybrid Model, which integrates both the Meta Agents Model and CodingAgents Model approaches, combining the statistical analysis and reasoning skills of multiple agents. Our proposed framework shows promising results by effectively utilizing LLMs expert knowledge, reasoning capabilities, multi-agent cooperation, and statistical causal methods. By exploring the multi-agent potential of LLMs, we aim to establish a foundation for further research in utilizing LLMs multi-agent for solving causal-related problems.

Via

Access Paper or Ask Questions

Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis

Jun 14, 2024

Zhang Chen, Luca Demetrio, Srishti Gupta, Xiaoyi Feng, Zhaoqiang Xia, Antonio Emanuele Cinà, Maura Pintor, Luca Oneto, Ambra Demontis, Battista Biggio(+1 more)

Abstract:Thanks to their extensive capacity, over-parameterized neural networks exhibit superior predictive capabilities and generalization. However, having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial example -- input samples crafted ad-hoc to induce a desired misclassification. Relevant literature has claimed contradictory remarks in support of and against the robustness of over-parameterized networks. These contradictory findings might be due to the failure of the attack employed to evaluate the networks' robustness. Previous research has demonstrated that depending on the considered model, the algorithm employed to generate adversarial examples may not function properly, leading to overestimating the model's robustness. In this work, we empirically study the robustness of over-parameterized networks against adversarial examples. However, unlike the previous works, we also evaluate the considered attack's reliability to support the results' veracity. Our results show that over-parameterized networks are robust against adversarial attacks as opposed to their under-parameterized counterparts.

Via

Access Paper or Ask Questions

OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Jun 14, 2024

Yuzhong Huang, Zhong Li, Zhang Chen, Zhiyuan Ren, Guosheng Lin, Fred Morstatter, Yi Xu

Figure 1 for OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Figure 2 for OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Figure 3 for OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Figure 4 for OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Abstract:In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus issue and exhibits a relatively slow optimization process. To circumvent these challenges, we introduce OrientDream, a camera orientation conditioned framework designed for efficient and multi-view consistent 3D generation from textual prompts. Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module. This feature effectively utilizes data from MVImgNet, an extensive external multi-view dataset, to refine and bolster its functionality. Subsequently, we utilize the pre-conditioned 2D images as a basis for optimizing a randomly initialized implicit representation (NeRF). This process is significantly expedited by a decoupled back-propagation technique, allowing for multiple updates of implicit parameters per optimization cycle. Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods, as quantified by comparative metrics.

Via

Access Paper or Ask Questions