Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yinan Zhang

Cube: A Roblox View of 3D Intelligence

Mar 19, 2025

Foundation AI Team, Kiran Bhat, Nishchaie Khanna, Karun Channa, Tinghui Zhou, Yiheng Zhu, Xiaoxia Sun, Charles Shang, Anirudh Sudarshan, Maurice Chu(+35 more)

Abstract:Foundation models trained on vast amounts of data have demonstrated remarkable reasoning and generation capabilities in the domains of text, images, audio and video. Our goal at Roblox is to build such a foundation model for 3D intelligence, a model that can support developers in producing all aspects of a Roblox experience, from generating 3D objects and scenes to rigging characters for animation to producing programmatic scripts describing object behaviors. We discuss three key design requirements for such a 3D foundation model and then present our first step towards building such a model. We expect that 3D geometric shapes will be a core data type and describe our solution for 3D shape tokenizer. We show how our tokenization scheme can be used in applications for text-to-shape generation, shape-to-text generation and text-to-scene generation. We demonstrate how these applications can collaborate with existing large language models (LLMs) to perform scene analysis and reasoning. We conclude with a discussion outlining our path to building a fully unified foundation model for 3D intelligence.

* Our code and model weights can be found at: https://github.com/Roblox/cube

Via

Access Paper or Ask Questions

Enabling Patient-side Disease Prediction via the Integration of Patient Narratives

May 05, 2024

Zhixiang Su, Yinan Zhang, Jiazheng Jing, Jie Xiao, Zhiqi Shen

Abstract:Disease prediction holds considerable significance in modern healthcare, because of its crucial role in facilitating early intervention and implementing effective prevention measures. However, most recent disease prediction approaches heavily rely on laboratory test outcomes (e.g., blood tests and medical imaging from X-rays). Gaining access to such data for precise disease prediction is often a complex task from the standpoint of a patient and is always only available post-patient consultation. To make disease prediction available from patient-side, we propose Personalized Medical Disease Prediction (PoMP), which predicts diseases using patient health narratives including textual descriptions and demographic information. By applying PoMP, patients can gain a clearer comprehension of their conditions, empowering them to directly seek appropriate medical specialists and thereby reducing the time spent navigating healthcare communication to locate suitable doctors. We conducted extensive experiments using real-world data from Haodf to showcase the effectiveness of PoMP.

Via

Access Paper or Ask Questions

Large-scale Reinforcement Learning for Diffusion Models

Jan 20, 2024

Yinan Zhang, Eric Tzeng, Yilun Du, Dmitry Kislyuk

Abstract:Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation. However, these models are susceptible to implicit biases that arise from web-scale text-image training pairs and may inaccurately model aspects of images we care about. This can result in suboptimal samples, model bias, and images that do not align with human ethics and preferences. In this paper, we present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL) across a diverse set of reward functions, such as human preference, compositionality, and fairness over millions of images. We illustrate how our approach substantially outperforms existing methods for aligning diffusion models with human preferences. We further illustrate how this substantially improves pretrained Stable Diffusion (SD) models, generating samples that are preferred by humans 80.3% of the time over those from the base SD model while simultaneously improving both the composition and diversity of generated samples.

Via

Access Paper or Ask Questions

Capturing Popularity Trends: A Simplistic Non-Personalized Approach for Enhanced Item Recommendation

Aug 17, 2023

Jiazheng Jing, Yinan Zhang, Xin Zhou, Zhiqi Shen

Abstract:Recommender systems have been gaining increasing research attention over the years. Most existing recommendation methods focus on capturing users' personalized preferences through historical user-item interactions, which may potentially violate user privacy. Additionally, these approaches often overlook the significance of the temporal fluctuation in item popularity that can sway users' decision-making. To bridge this gap, we propose Popularity-Aware Recommender (PARE), which makes non-personalized recommendations by predicting the items that will attain the highest popularity. PARE consists of four modules, each focusing on a different aspect: popularity history, temporal impact, periodic impact, and side information. Finally, an attention layer is leveraged to fuse the outputs of four modules. To our knowledge, this is the first work to explicitly model item popularity in recommendation systems. Extensive experiments show that PARE performs on par or even better than sophisticated state-of-the-art recommendation methods. Since PARE prioritizes item popularity over personalized user preferences, it can enhance existing recommendation methods as a complementary component. Our experiments demonstrate that integrating PARE with existing recommendation methods significantly surpasses the performance of standalone models, highlighting PARE's potential as a complement to existing recommendation methods. Furthermore, the simplicity of PARE makes it immensely practical for industrial applications and a valuable baseline for future research.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions

Minimalist and High-performance Conversational Recommendation with Uncertainty Estimation for User Preference

Jul 01, 2022

Yinan Zhang, Boyang Li, Yong Liu, You Yuan, Chunyan Miao

Figure 1 for Minimalist and High-performance Conversational Recommendation with Uncertainty Estimation for User Preference

Figure 2 for Minimalist and High-performance Conversational Recommendation with Uncertainty Estimation for User Preference

Figure 3 for Minimalist and High-performance Conversational Recommendation with Uncertainty Estimation for User Preference

Figure 4 for Minimalist and High-performance Conversational Recommendation with Uncertainty Estimation for User Preference

Abstract:Conversational recommendation system (CRS) is emerging as a user-friendly way to capture users' dynamic preferences over candidate items and attributes. Multi-shot CRS is designed to make recommendations multiple times until the user either accepts the recommendation or leaves at the end of their patience. Existing works are trained with reinforcement learning (RL), which may suffer from unstable learning and prohibitively high demands for computing. In this work, we propose a simple and efficient CRS, MInimalist Non-reinforced Interactive COnversational Recommender Network (MINICORN). MINICORN models the epistemic uncertainty of the estimated user preference and queries the user for the attribute with the highest uncertainty. The system employs a simple network architecture and makes the query-vs-recommendation decision using a single rule. Somewhat surprisingly, this minimalist approach outperforms state-of-the-art RL methods on three real-world datasets by large margins. We hope that MINICORN will serve as a valuable baseline for future research.

Via

Access Paper or Ask Questions

Learning Large-scale Subsurface Simulations with a Hybrid Graph Network Simulator

Jun 15, 2022

Tailin Wu, Qinchen Wang, Yinan Zhang, Rex Ying, Kaidi Cao, Rok Sosič, Ridwan Jalali, Hassan Hamam, Marko Maucec, Jure Leskovec

Figure 1 for Learning Large-scale Subsurface Simulations with a Hybrid Graph Network Simulator

Figure 2 for Learning Large-scale Subsurface Simulations with a Hybrid Graph Network Simulator

Figure 3 for Learning Large-scale Subsurface Simulations with a Hybrid Graph Network Simulator

Figure 4 for Learning Large-scale Subsurface Simulations with a Hybrid Graph Network Simulator

Abstract:Subsurface simulations use computational models to predict the flow of fluids (e.g., oil, water, gas) through porous media. These simulations are pivotal in industrial applications such as petroleum production, where fast and accurate models are needed for high-stake decision making, for example, for well placement optimization and field development planning. Classical finite difference numerical simulators require massive computational resources to model large-scale real-world reservoirs. Alternatively, streamline simulators and data-driven surrogate models are computationally more efficient by relying on approximate physics models, however they are insufficient to model complex reservoir dynamics at scale. Here we introduce Hybrid Graph Network Simulator (HGNS), which is a data-driven surrogate model for learning reservoir simulations of 3D subsurface fluid flows. To model complex reservoir dynamics at both local and global scale, HGNS consists of a subsurface graph neural network (SGNN) to model the evolution of fluid flows, and a 3D-U-Net to model the evolution of pressure. HGNS is able to scale to grids with millions of cells per time step, two orders of magnitude higher than previous surrogate models, and can accurately predict the fluid flow for tens of time steps (years into the future). Using an industry-standard subsurface flow dataset (SPE-10) with 1.1 million cells, we demonstrate that HGNS is able to reduce the inference time up to 18 times compared to standard subsurface simulators, and that it outperforms other learning-based models by reducing long-term prediction errors by up to 21%.

* SIGKDD 2022; 11 pages, 6 figures

Via

Access Paper or Ask Questions

IA-GCN: Interactive Graph Convolutional Network for Recommendation

Apr 08, 2022

Yinan Zhang, Pei Wang, Xiwei Zhao, Hao Qi, Jie He, Junsheng Jin, Changping Peng, Zhangang Lin, Jingping Shao

Figure 1 for IA-GCN: Interactive Graph Convolutional Network for Recommendation

Figure 2 for IA-GCN: Interactive Graph Convolutional Network for Recommendation

Figure 3 for IA-GCN: Interactive Graph Convolutional Network for Recommendation

Figure 4 for IA-GCN: Interactive Graph Convolutional Network for Recommendation

Abstract:Recently, Graph Convolutional Network (GCN) has become a novel state-of-art for Collaborative Filtering (CF) based Recommender Systems (RS). It is a common practice to learn informative user and item representations by performing embedding propagation on a user-item bipartite graph, and then provide the users with personalized item suggestions based on the representations. Despite effectiveness, existing algorithms neglect precious interactive features between user-item pairs in the embedding process. When predicting a user's preference for different items, they still aggregate the user tree in the same way, without emphasizing target-related information in the user neighborhood. Such a uniform aggregation scheme easily leads to suboptimal user and item representations, limiting the model expressiveness to some extent. In this work, we address this problem by building bilateral interactive guidance between each user-item pair and proposing a new model named IA-GCN (short for InterActive GCN). Specifically, when learning the user representation from its neighborhood, we assign higher attention weights to those neighbors similar to the target item. Correspondingly, when learning the item representation, we pay more attention to those neighbors resembling the target user. This leads to interactive and interpretable features, effectively distilling target-specific information through each graph convolutional operation. Our model is built on top of LightGCN, a state-of-the-art GCN model for CF, and can be combined with various GCN-based CF architectures in an end-to-end fashion. Extensive experiments on three benchmark datasets demonstrate the effectiveness and robustness of IA-GCN.

* The code will be released after paper acceptance

Via

Access Paper or Ask Questions

Initialization Matters: Regularizing Manifold-informed Initialization for Neural Recommendation Systems

Jun 09, 2021

Yinan Zhang, Boyang Li, Yong Liu, Hao Wang, Chunyan Miao

Figure 1 for Initialization Matters: Regularizing Manifold-informed Initialization for Neural Recommendation Systems

Figure 2 for Initialization Matters: Regularizing Manifold-informed Initialization for Neural Recommendation Systems

Figure 3 for Initialization Matters: Regularizing Manifold-informed Initialization for Neural Recommendation Systems

Figure 4 for Initialization Matters: Regularizing Manifold-informed Initialization for Neural Recommendation Systems

Abstract:Proper initialization is crucial to the optimization and the generalization of neural networks. However, most existing neural recommendation systems initialize the user and item embeddings randomly. In this work, we propose a new initialization scheme for user and item embeddings called Laplacian Eigenmaps with Popularity-based Regularization for Isolated Data (LEPORID). LEPORID endows the embeddings with information regarding multi-scale neighborhood structures on the data manifold and performs adaptive regularization to compensate for high embedding variance on the tail of the data distribution. Exploiting matrix sparsity, LEPORID embeddings can be computed efficiently. We evaluate LEPORID in a wide range of neural recommendation models. In contrast to the recent surprising finding that the simple K-nearest-neighbor (KNN) method often outperforms neural recommendation systems, we show that existing neural systems initialized with LEPORID often perform on par or better than KNN. To maximize the effects of the initialization, we propose the Dual-Loss Residual Recommendation (DLR2) network, which, when initialized with LEPORID, substantially outperforms both traditional and state-of-the-art neural recommender systems.

Via

Access Paper or Ask Questions

Neural Radiance Flow for 4D View Synthesis and Video Processing

Dec 17, 2020

Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu

Figure 1 for Neural Radiance Flow for 4D View Synthesis and Video Processing

Figure 2 for Neural Radiance Flow for 4D View Synthesis and Video Processing

Figure 3 for Neural Radiance Flow for 4D View Synthesis and Video Processing

Figure 4 for Neural Radiance Flow for 4D View Synthesis and Video Processing

Abstract:We present a method, Neural Radiance Flow (NeRFlow),to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene. By enforcing consistency across different modalities, our representation enables multi-view rendering in diverse dynamic scenes, including water pouring, robotic interaction, and real images, outperforming state-of-the-art methods for spatial-temporal view synthesis. Our approach works even when inputs images are captured with only one camera. We further demonstrate that the learned representation can serve as an implicit scene prior, enabling video processing tasks such as image super-resolution and de-noising without any additional supervision.

* Website: https://yilundu.github.io/nerflow/

Via

Access Paper or Ask Questions

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

Oct 18, 2020

Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler

Figure 1 for Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

Figure 2 for Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

Figure 3 for Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

Figure 4 for Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

Abstract:Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks such as predicting 3D geometry from monocular photographs. To train high performing models, most of the current approaches rely on multi-view imagery which are not readily available in practice. Recent Generative Adversarial Networks (GANs) that synthesize images, in contrast, seem to acquire 3D knowledge implicitly during training: object viewpoints can be manipulated by simply manipulating the latent codes. However, these latent codes often lack further physical interpretation and thus GANs cannot easily be inverted to perform explicit 3D reasoning. In this paper, we aim to extract and disentangle 3D knowledge learned by generative models by utilizing differentiable renderers. Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties. The entire architecture is trained iteratively using cycle consistency losses. We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets, both quantitatively and via user studies. We further showcase the disentangled GAN as a controllable 3D "neural renderer", complementing traditional graphics renderers.

Via

Access Paper or Ask Questions