Abstract:Generative artificial intelligence (AI) has made significant progress across various domains in recent years. Building on the rapid advancements in 2D, video, and 3D content generation fields, 4D generation has emerged as a novel and rapidly evolving research area, attracting growing attention. 4D generation focuses on creating dynamic 3D assets with spatiotemporal consistency based on user input, offering greater creative freedom and richer immersive experiences. This paper presents a comprehensive survey of the 4D generation field, systematically summarizing its core technologies, developmental trajectory, key challenges, and practical applications, while also exploring potential future research directions. The survey begins by introducing various fundamental 4D representation models, followed by a review of 4D generation frameworks built upon these representations and the key technologies that incorporate motion and geometry priors into 4D assets. We summarize five major challenges of 4D generation: consistency, controllability, diversity, efficiency, and fidelity, accompanied by an outline of existing solutions to address these issues. We systematically analyze applications of 4D generation, spanning dynamic object generation, scene generation, digital human synthesis, 4D editing, and autonomous driving. Finally, we provide an in-depth discussion of the obstacles currently hindering the development of the 4D generation. This survey offers a clear and comprehensive overview of 4D generation, aiming to stimulate further exploration and innovation in this rapidly evolving field. Our code is publicly available at: https://github.com/MiaoQiaowei/Awesome-4D.
Abstract:3D reconstruction and simulation, while interrelated, have distinct objectives: reconstruction demands a flexible 3D representation adaptable to diverse scenes, whereas simulation requires a structured representation to model motion principles effectively. This paper introduces the Mesh-adsorbed Gaussian Splatting (MaGS) method to resolve such a dilemma. MaGS constrains 3D Gaussians to hover on the mesh surface, creating a mutual-adsorbed mesh-Gaussian 3D representation that combines the rendering flexibility of 3D Gaussians with the spatial coherence of meshes. Leveraging this representation, we introduce a learnable Relative Deformation Field (RDF) to model the relative displacement between the mesh and 3D Gaussians, extending traditional mesh-driven deformation paradigms that only rely on ARAP prior, thus capturing the motion of each 3D Gaussian more precisely. By joint optimizing meshes, 3D Gaussians, and RDF, MaGS achieves both high rendering accuracy and realistic deformation. Extensive experiments on the D-NeRF and NeRF-DS datasets demonstrate that MaGS can generate competitive results in both reconstruction and simulation.