Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hooman Shayani

Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings

Nov 12, 2024

Aditya Sanghi, Aliasghar Khani, Pradyumna Reddy, Arianna Rampini, Derek Cheung, Kamal Rahimi Malekshan, Kanika Madan, Hooman Shayani

Figure 1 for Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings

Figure 2 for Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings

Figure 3 for Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings

Figure 4 for Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings

Abstract:Large-scale 3D generative models require substantial computational resources yet often fall short in capturing fine details and complex geometries at high resolutions. We attribute this limitation to the inefficiency of current representations, which lack the compactness required to model the generative models effectively. To address this, we introduce a novel approach called Wavelet Latent Diffusion, or WaLa, that encodes 3D shapes into wavelet-based, compact latent encodings. Specifically, we compress a $256^3$ signed distance field into a $12^3 \times 4$ latent grid, achieving an impressive 2427x compression ratio with minimal loss of detail. This high level of compression allows our method to efficiently train large-scale generative networks without increasing the inference time. Our models, both conditional and unconditional, contain approximately one billion parameters and successfully generate high-quality 3D shapes at $256^3$ resolution. Moreover, WaLa offers rapid inference, producing shapes within two to four seconds depending on the condition, despite the model's scale. We demonstrate state-of-the-art performance across multiple datasets, with significant improvements in generation quality, diversity, and computational efficiency. We open-source our code and, to the best of our knowledge, release the largest pretrained 3D generative models across different modalities.

Via

Access Paper or Ask Questions

Make-A-Shape: a Ten-Million-scale 3D Shape Model

Jan 20, 2024

Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu

Figure 1 for Make-A-Shape: a Ten-Million-scale 3D Shape Model

Figure 2 for Make-A-Shape: a Ten-Million-scale 3D Shape Model

Figure 3 for Make-A-Shape: a Ten-Million-scale 3D Shape Model

Figure 4 for Make-A-Shape: a Ten-Million-scale 3D Shape Model

Abstract:Significant progress has been made in training large generative models for natural language and images. Yet, the advancement of 3D generative models is hindered by their substantial resource demands for training, along with inefficient, non-compact, and less expressive representations. This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale, capable of utilizing 10 millions publicly-available shapes. Technical-wise, we first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme to efficiently exploit coefficient relations. We then make the representation generatable by a diffusion model by devising the subband coefficients packing scheme to layout the representation in a low-resolution grid. Further, we derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients. Last, we extend our framework to be controlled by additional input conditions to enable it to generate shapes from assorted modalities, e.g., single/multi-view images, point clouds, and low-resolution voxels. In our extensive set of experiments, we demonstrate various applications, such as unconditional generation, shape completion, and conditional generation on a wide range of modalities. Our approach not only surpasses the state of the art in delivering high-quality results but also efficiently generates shapes within a few seconds, often achieving this in just 2 seconds for most conditions.

Via

Access Paper or Ask Questions

Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation

Jul 08, 2023

Aditya Sanghi, Pradeep Kumar Jayaraman, Arianna Rampini, Joseph Lambourne, Hooman Shayani, Evan Atherton, Saeid Asgari Taghanaki

Abstract:Significant progress has recently been made in creative applications of large pre-trained models for downstream tasks in 3D vision, such as text-to-shape generation. This motivates our investigation of how these pre-trained models can be used effectively to generate 3D shapes from sketches, which has largely remained an open challenge due to the limited sketch-shape paired datasets and the varying level of abstraction in the sketches. We discover that conditioning a 3D generative model on the features (obtained from a frozen large pre-trained vision model) of synthetic renderings during training enables us to effectively generate 3D shapes from sketches at inference time. This suggests that the large pre-trained vision model features carry semantic signals that are resilient to domain shifts, i.e., allowing us to use only RGB renderings, but generalizing to sketches at inference time. We conduct a comprehensive set of experiments investigating different design factors and demonstrate the effectiveness of our straightforward approach for generation of multiple 3D shapes per each input sketch regardless of their level of abstraction without requiring any paired datasets during training.

Via

Access Paper or Ask Questions

TextCraft: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Text

Nov 04, 2022

Aditya Sanghi, Rao Fu, Vivian Liu, Karl Willis, Hooman Shayani, Amir Hosein Khasahmadi, Srinath Sridhar, Daniel Ritchie

Figure 1 for TextCraft: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Text

Figure 2 for TextCraft: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Text

Figure 3 for TextCraft: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Text

Figure 4 for TextCraft: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Text

Abstract:Language is one of the primary means by which we describe the 3D world around us. While rapid progress has been made in text-to-2D-image synthesis, similar progress in text-to-3D-shape synthesis has been hindered by the lack of paired (text, shape) data. Moreover, extant methods for text-to-shape generation have limited shape diversity and fidelity. We introduce TextCraft, a method to address these limitations by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs for training. TextCraft achieves this by using CLIP and using a multi-resolution approach by first generating in a low-dimensional latent space and then upscaling to a higher resolution, improving the fidelity of the generated shape. To improve shape diversity, we use a discrete latent space which is modelled using a bidirectional transformer conditioned on the interchangeable image-text embedding space induced by CLIP. Moreover, we present a novel variant of classifier-free guidance, which further improves the accuracy-diversity trade-off. Finally, we perform extensive experiments that demonstrate that TextCraft outperforms state-of-the-art baselines.

Via

Access Paper or Ask Questions

UNIST: Unpaired Neural Implicit Shape Translation Network

Dec 10, 2021

Qimin Chen, Johannes Merz, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, Hao Zhang

Figure 1 for UNIST: Unpaired Neural Implicit Shape Translation Network

Figure 2 for UNIST: Unpaired Neural Implicit Shape Translation Network

Figure 3 for UNIST: Unpaired Neural Implicit Shape Translation Network

Figure 4 for UNIST: Unpaired Neural Implicit Shape Translation Network

Abstract:We introduce UNIST, the first deep neural implicit model for general-purpose, unpaired shape-to-shape translation, in both 2D and 3D domains. Our model is built on autoencoding implicit fields, rather than point clouds which represents the state of the art. Furthermore, our translation network is trained to perform the task over a latent grid representation which combines the merits of both latent-space processing and position awareness, to not only enable drastic shape transforms but also well preserve spatial features and fine local details for natural shape translations. With the same network architecture and only dictated by the input domain pairs, our model can learn both style-preserving content alteration and content-preserving style transfer. We demonstrate the generality and quality of the translation results, and compare them to well-known baselines.

* project page: https://qiminchen.github.io/unist/

Via

Access Paper or Ask Questions

UVStyle-Net: Unsupervised Few-shot Learning of 3D Style Similarity Measure for B-Reps

Apr 28, 2021

Peter Meltzer, Hooman Shayani, Amir Khasahmadi, Pradeep Kumar Jayaraman, Aditya Sanghi, Joseph Lambourne

Figure 1 for UVStyle-Net: Unsupervised Few-shot Learning of 3D Style Similarity Measure for B-Reps

Figure 2 for UVStyle-Net: Unsupervised Few-shot Learning of 3D Style Similarity Measure for B-Reps

Figure 3 for UVStyle-Net: Unsupervised Few-shot Learning of 3D Style Similarity Measure for B-Reps

Figure 4 for UVStyle-Net: Unsupervised Few-shot Learning of 3D Style Similarity Measure for B-Reps

Abstract:Boundary Representations (B-Reps) are the industry standard in 3D Computer Aided Design/Manufacturing (CAD/CAM) and industrial design due to their fidelity in representing stylistic details. However, they have been ignored in the 3D style research. Existing 3D style metrics typically operate on meshes or pointclouds, and fail to account for end-user subjectivity by adopting fixed definitions of style, either through crowd-sourcing for style labels or hand-crafted features. We propose UVStyle-Net, a style similarity measure for B-Reps that leverages the style signals in the second order statistics of the activations in a pre-trained (unsupervised) 3D encoder, and learns their relative importance to a subjective end-user through few-shot learning. Our approach differs from all existing data-driven 3D style methods since it may be used in completely unsupervised settings, which is desirable given the lack of publicly available labelled B-Rep datasets. More importantly, the few-shot learning accounts for the inherent subjectivity associated with style. We show quantitatively that our proposed method with B-Reps is able to capture stronger style signals than alternative methods on meshes and pointclouds despite its significantly greater computational efficiency. We also show it is able to generate meaningful style gradients with respect to the input shape, and that few-shot learning with as few as two positive examples selected by an end-user is sufficient to significantly improve the style measure. Finally, we demonstrate its efficacy on a large unlabeled public dataset of CAD models. Source code and data will be released in the future.

* 13 pages, 20 figures, 5 tables

Via

Access Paper or Ask Questions

CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive Assembly

Apr 12, 2021

Fenggen Yu, Zhiqin Chen, Manyi Li, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, Hao Zhang

Figure 1 for CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive Assembly

Figure 2 for CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive Assembly

Figure 3 for CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive Assembly

Figure 4 for CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive Assembly

Abstract:We introduce CAPRI-Net, a neural network for learning compact and interpretable implicit representations of 3D computer-aided design (CAD) models, in the form of adaptive primitive assemblies. Our network takes an input 3D shape that can be provided as a point cloud or voxel grids, and reconstructs it by a compact assembly of quadric surface primitives via constructive solid geometry (CSG) operations. The network is self-supervised with a reconstruction loss, leading to faithful 3D reconstructions with sharp edges and plausible CSG trees, without any ground-truth shape assemblies. While the parametric nature of CAD models does make them more predictable locally, at the shape level, there is a great deal of structural and topological variations, which present a significant generalizability challenge to state-of-the-art neural models for 3D shapes. Our network addresses this challenge by adaptive training with respect to each test shape, with which we fine-tune the network that was pre-trained on a model collection. We evaluate our learning framework on both ShapeNet and ABC, the largest and most diverse CAD dataset to date, in terms of reconstruction quality, shape edges, compactness, and interpretability, to demonstrate superiority over current alternatives suitable for neural CAD reconstruction.

Via

Access Paper or Ask Questions

BRepNet: A topological message passing system for solid models

Apr 08, 2021

Joseph G. Lambourne, Karl D. D. Willis, Pradeep Kumar Jayaraman, Aditya Sanghi, Peter Meltzer, Hooman Shayani

Figure 1 for BRepNet: A topological message passing system for solid models

Figure 2 for BRepNet: A topological message passing system for solid models

Figure 3 for BRepNet: A topological message passing system for solid models

Figure 4 for BRepNet: A topological message passing system for solid models

Abstract:Boundary representation (B-rep) models are the standard way 3D shapes are described in Computer-Aided Design (CAD) applications. They combine lightweight parametric curves and surfaces with topological information which connects the geometric entities to describe manifolds. In this paper we introduce BRepNet, a neural network architecture designed to operate directly on B-rep data structures, avoiding the need to approximate the model as meshes or point clouds. BRepNet defines convolutional kernels with respect to oriented coedges in the data structure. In the neighborhood of each coedge, a small collection of faces, edges and coedges can be identified and patterns in the feature vectors from these entities detected by specific learnable parameters. In addition, to encourage further deep learning research with B-reps, we publish the Fusion 360 Gallery segmentation dataset. A collection of over 35,000 B-rep models annotated with information about the modeling operations which created each face. We demonstrate that BRepNet can segment these models with higher accuracy than methods working on meshes, and point clouds.

* CVPR 2021 Oral

Via

Access Paper or Ask Questions

UV-Net: Learning from Curve-Networks and Solids

Jun 18, 2020

Pradeep Kumar Jayaraman, Aditya Sanghi, Joseph Lambourne, Thomas Davies, Hooman Shayani, Nigel Morris

Figure 1 for UV-Net: Learning from Curve-Networks and Solids

Figure 2 for UV-Net: Learning from Curve-Networks and Solids

Figure 3 for UV-Net: Learning from Curve-Networks and Solids

Figure 4 for UV-Net: Learning from Curve-Networks and Solids

Abstract:Parametric curves, surfaces and boundary representations are the basis for 2D vector graphics and 3D industrial designs. Despite their prevalence, there exists limited research on applying modern deep neural networks directly to such representations. The unique challenges in working with such representations arise from the combination of continuous non-Euclidean geometry domain and discrete topology, as well as a lack of labeled datasets, benchmarks and baseline models. In this paper, we propose a unified representation for parametric curve-networks and solids by exploiting the u- and uv-parameter domains of curve and surfaces, respectively, to model the geometry, and an adjacency graph to explicitly model the topology. This leads to a unique and efficient network architecture based on coupled image and graph convolutional neural networks to extract features from curve-networks and solids. Inspired by the MNIST image dataset, we create and publish WireMNIST (for 2D curve-networks) and SolidMNIST (for 3D solids), two related labeled datasets depicting alphabets to encourage future research in this area. We demonstrate the effectiveness of our method using supervised and self-supervised tasks on our new datasets, as well as the publicly available ABC dataset. The results demonstrate the effectiveness of our representation and provide a competitive baseline for learning tasks involving curve-networks and solids.

Via

Access Paper or Ask Questions