Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhen Zhou

Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation

Sep 16, 2025

Biwen Lei, Yang Li, Xinhai Liu, Shuhui Yang, Lixin Xu, Jingwei Huang, Ruining Tang, Haohan Weng, Jian Liu, Jing Xu(+90 more)

Figure 1 for Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation

Figure 2 for Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation

Figure 3 for Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation

Figure 4 for Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation

Abstract:The creation of high-quality 3D assets, a cornerstone of modern game development, has long been characterized by labor-intensive and specialized workflows. This paper presents Hunyuan3D Studio, an end-to-end AI-powered content creation platform designed to revolutionize the game production pipeline by automating and streamlining the generation of game-ready 3D assets. At its core, Hunyuan3D Studio integrates a suite of advanced neural modules (such as Part-level 3D Generation, Polygon Generation, Semantic UV, etc.) into a cohesive and user-friendly system. This unified framework allows for the rapid transformation of a single concept image or textual description into a fully-realized, production-quality 3D model complete with optimized geometry and high-fidelity PBR textures. We demonstrate that assets generated by Hunyuan3D Studio are not only visually compelling but also adhere to the stringent technical requirements of contemporary game engines, significantly reducing iteration time and lowering the barrier to entry for 3D content creation. By providing a seamless bridge from creative intent to technical asset, Hunyuan3D Studio represents a significant leap forward for AI-assisted workflows in game development and interactive media.

* Technical Report

Via

Access Paper or Ask Questions

HAD: Hybrid Architecture Distillation Outperforms Teacher in Genomic Sequence Modeling

May 27, 2025

Hexiong Yang, Mingrui Chen, Huaibo Huang, Junxian Duan, Jie Cao, Zhen Zhou, Ran He

Abstract:Inspired by the great success of Masked Language Modeling (MLM) in the natural language domain, the paradigm of self-supervised pre-training and fine-tuning has also achieved remarkable progress in the field of DNA sequence modeling. However, previous methods often relied on massive pre-training data or large-scale base models with huge parameters, imposing a significant computational burden. To address this, many works attempted to use more compact models to achieve similar outcomes but still fell short by a considerable margin. In this work, we propose a Hybrid Architecture Distillation (HAD) approach, leveraging both distillation and reconstruction tasks for more efficient and effective pre-training. Specifically, we employ the NTv2-500M as the teacher model and devise a grouping masking strategy to align the feature embeddings of visible tokens while concurrently reconstructing the invisible tokens during MLM pre-training. To validate the effectiveness of our proposed method, we conducted comprehensive experiments on the Nucleotide Transformer Benchmark and Genomic Benchmark. Compared to models with similar parameters, our model achieved excellent performance. More surprisingly, it even surpassed the distillation ceiling-teacher model on some sub-tasks, which is more than 500 $\times$ larger. Lastly, we utilize t-SNE for more intuitive visualization, which shows that our model can gain a sophisticated understanding of the intrinsic representation pattern in genomic sequences.

Via

Access Paper or Ask Questions

EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video

Sep 03, 2024

Zhen Zhou, Yunkai Ma, Junfeng Fan, Shaolin Zhang, Fengshui Jing, Min Tan

Figure 1 for EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video

Figure 2 for EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video

Figure 3 for EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video

Figure 4 for EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video

Abstract:Panoptic 3D reconstruction from a monocular video is a fundamental perceptual task in robotic scene understanding. However, existing efforts suffer from inefficiency in terms of inference speed and accuracy, limiting their practical applicability. We present EPRecon, an efficient real-time panoptic 3D reconstruction framework. Current volumetric-based reconstruction methods usually utilize multi-view depth map fusion to obtain scene depth priors, which is time-consuming and poses challenges to real-time scene reconstruction. To end this, we propose a lightweight module to directly estimate scene depth priors in a 3D volume for reconstruction quality improvement by generating occupancy probabilities of all voxels. In addition, to infer richer panoptic features from occupied voxels, EPRecon extracts panoptic features from both voxel features and corresponding image features, obtaining more detailed and comprehensive instance-level semantic information and achieving more accurate segmentation results. Experimental results on the ScanNetV2 dataset demonstrate the superiority of EPRecon over current state-of-the-art methods in terms of both panoptic 3D reconstruction quality and real-time inference. Code is available at https://github.com/zhen6618/EPRecon.

Via

Access Paper or Ask Questions

Completely Occluded and Dense Object Instance Segmentation Using Box Prompt-Based Segmentation Foundation Models

Jan 16, 2024

Zhen Zhou, Junfeng Fan, Yunkai Ma, Sihan Zhao, Fengshui Jing, Min Tan

Abstract:Completely occluded and dense object instance segmentation (IS) is an important and challenging task. Although current amodal IS methods can predict invisible regions of occluded objects, they are difficult to directly predict completely occluded objects. For dense object IS, existing box-based methods are overly dependent on the performance of bounding box detection. In this paper, we propose CFNet, a coarse-to-fine IS framework for completely occluded and dense objects, which is based on box prompt-based segmentation foundation models (BSMs). Specifically, CFNet first detects oriented bounding boxes (OBBs) to distinguish instances and provide coarse localization information. Then, it predicts OBB prompt-related masks for fine segmentation. To predict completely occluded object instances, CFNet performs IS on occluders and utilizes prior geometric properties, which overcomes the difficulty of directly predicting completely occluded object instances. Furthermore, based on BSMs, CFNet reduces the dependence on bounding box detection performance, improving dense object IS performance. Moreover, we propose a novel OBB prompt encoder for BSMs. To make CFNet more lightweight, we perform knowledge distillation on it and introduce a Gaussian smoothing method for teacher targets. Experimental results demonstrate that CFNet achieves the best performance on both industrial and publicly available datasets.

Via

Access Paper or Ask Questions

Linear Gaussian Bounding Box Representation and Ring-Shaped Rotated Convolution for Oriented Object Detection

Nov 14, 2023

Zhen Zhou, Yunkai Ma, Junfeng Fan, Zhaoyang Liu, Fengshui Jing, Min Tan

Abstract:In oriented object detection, current representations of oriented bounding boxes (OBBs) often suffer from boundary discontinuity problem. Methods of designing continuous regression losses do not essentially solve this problem. Although Gaussian bounding box (GBB) representation avoids this problem, directly regressing GBB is susceptible to numerical instability. We propose linear GBB (LGBB), a novel OBB representation. By linearly transforming the elements of GBB, LGBB avoids the boundary discontinuity problem and has high numerical stability. In addition, existing convolution-based rotation-sensitive feature extraction methods only have local receptive fields, resulting in slow feature aggregation. We propose ring-shaped rotated convolution (RRC), which adaptively rotates feature maps to arbitrary orientations to extract rotation-sensitive features under a ring-shaped receptive field, rapidly aggregating features and contextual information. Experimental results demonstrate that LGBB and RRC achieve state-of-the-art performance. Furthermore, integrating LGBB and RRC into various models effectively improves detection accuracy.

Via

Access Paper or Ask Questions

Deep Clustering Survival Machines with Interpretable Expert Distributions

Jan 27, 2023

Bojian Hou, Hongming Li, Zhicheng Jiao, Zhen Zhou, Hao Zheng, Yong Fan

Figure 1 for Deep Clustering Survival Machines with Interpretable Expert Distributions

Figure 2 for Deep Clustering Survival Machines with Interpretable Expert Distributions

Figure 3 for Deep Clustering Survival Machines with Interpretable Expert Distributions

Figure 4 for Deep Clustering Survival Machines with Interpretable Expert Distributions

Abstract:Conventional survival analysis methods are typically ineffective to characterize heterogeneity in the population while such information can be used to assist predictive modeling. In this study, we propose a hybrid survival analysis method, referred to as deep clustering survival machines, that combines the discriminative and generative mechanisms. Similar to the mixture models, we assume that the timing information of survival data is generatively described by a mixture of certain numbers of parametric distributions, i.e., expert distributions. We learn weights of the expert distributions for individual instances according to their features discriminatively such that each instance's survival information can be characterized by a weighted combination of the learned constant expert distributions. This method also facilitates interpretable subgrouping/clustering of all instances according to their associated expert distributions. Extensive experiments on both real and synthetic datasets have demonstrated that the method is capable of obtaining promising clustering results and competitive time-to-event predicting performance.

Via

Access Paper or Ask Questions

Diagnose Like a Radiologist: Hybrid Neuro-Probabilistic Reasoning for Attribute-Based Medical Image Diagnosis

Aug 19, 2022

Gangming Zhao, Quanlong Feng, Chaoqi Chen, Zhen Zhou, Yizhou Yu

Figure 1 for Diagnose Like a Radiologist: Hybrid Neuro-Probabilistic Reasoning for Attribute-Based Medical Image Diagnosis

Figure 2 for Diagnose Like a Radiologist: Hybrid Neuro-Probabilistic Reasoning for Attribute-Based Medical Image Diagnosis

Figure 3 for Diagnose Like a Radiologist: Hybrid Neuro-Probabilistic Reasoning for Attribute-Based Medical Image Diagnosis

Figure 4 for Diagnose Like a Radiologist: Hybrid Neuro-Probabilistic Reasoning for Attribute-Based Medical Image Diagnosis

Abstract:During clinical practice, radiologists often use attributes, e.g. morphological and appearance characteristics of a lesion, to aid disease diagnosis. Effectively modeling attributes as well as all relationships involving attributes could boost the generalization ability and verifiability of medical image diagnosis algorithms. In this paper, we introduce a hybrid neuro-probabilistic reasoning algorithm for verifiable attribute-based medical image diagnosis. There are two parallel branches in our hybrid algorithm, a Bayesian network branch performing probabilistic causal relationship reasoning and a graph convolutional network branch performing more generic relational modeling and reasoning using a feature representation. Tight coupling between these two branches is achieved via a cross-network attention mechanism and the fusion of their classification results. We have successfully applied our hybrid reasoning algorithm to two challenging medical image diagnosis tasks. On the LIDC-IDRI benchmark dataset for benign-malignant classification of pulmonary nodules in CT images, our method achieves a new state-of-the-art accuracy of 95.36\% and an AUC of 96.54\%. Our method also achieves a 3.24\% accuracy improvement on an in-house chest X-ray image dataset for tuberculosis diagnosis. Our ablation study indicates that our hybrid algorithm achieves a much better generalization performance than a pure neural network architecture under very limited training data.

* To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Via

Access Paper or Ask Questions

Brain Network Construction and Classification Toolbox (BrainNetClass)

Jun 17, 2019

Zhen Zhou, Xiaobo Chen, Yu Zhang, Lishan Qiao, Renping Yu, Gang Pan, Han Zhang, Dinggang Shen

Figure 1 for Brain Network Construction and Classification Toolbox (BrainNetClass)

Figure 2 for Brain Network Construction and Classification Toolbox (BrainNetClass)

Figure 3 for Brain Network Construction and Classification Toolbox (BrainNetClass)

Figure 4 for Brain Network Construction and Classification Toolbox (BrainNetClass)

Abstract:Brain functional network has become an increasingly used approach in understanding brain functions and diseases. Many network construction methods have been developed, whereas the majority of the studies still used static pairwise Pearson's correlation-based functional connectivity. The goal of this work is to introduce a toolbox namely "Brain Network Construction and Classification" (BrainNetClass) to the field to promote more advanced brain network construction methods. It comprises various brain network construction methods, including some state-of-the-art methods that were recently developed to capture more complex interactions among brain regions along with connectome feature extraction, reduction, parameter optimization towards network-based individualized classification. BrainNetClass is a MATLAB-based, open-source, cross-platform toolbox with graphical user-friendly interfaces for cognitive and clinical neuroscientists to perform rigorous computer-aided diagnosis with interpretable result presentations even though they do not possess neuroimage computing and machine learning knowledge. We demonstrate the implementations of this toolbox on real resting-state functional MRI datasets. BrainNetClass (v1.0) can be downloaded from https://github.com/zzstefan/BrainNetClass.

Via

Access Paper or Ask Questions

Integrating Feature and Image Pyramid: A Lung Nodule Detector Learned in Curriculum Fashion

Aug 01, 2018

Benyuan Sun, Zhen Zhou, Fandong Zhang, Xiuli Li, Yizhou Wang

Figure 1 for Integrating Feature and Image Pyramid: A Lung Nodule Detector Learned in Curriculum Fashion

Figure 2 for Integrating Feature and Image Pyramid: A Lung Nodule Detector Learned in Curriculum Fashion

Figure 3 for Integrating Feature and Image Pyramid: A Lung Nodule Detector Learned in Curriculum Fashion

Figure 4 for Integrating Feature and Image Pyramid: A Lung Nodule Detector Learned in Curriculum Fashion

Abstract:Lung nodules suffer large variation in size and appearance in CT images. Nodules less than 10mm can easily lose information after down-sampling in convolutional neural networks, which results in low sensitivity. In this paper, a combination of 3D image and feature pyramid is exploited to integrate lower-level texture features with high-level semantic features, thus leading to a higher recall. However, 3D operations are time and memory consuming, which aggravates the situation with the explosive growth of medical images. To tackle this problem, we propose a general curriculum training strategy to speed up training. An dynamic sampling method is designed to pick up partial samples which give the best contribution to network training, thus leading to much less time consuming. In experiments, we demonstrate that the proposed network outperforms previous state-of-the-art methods. Meanwhile, our sampling strategy halves the training time of the proposal network on LUNA16.

* Nodule Detection, Deep Learning, Curriculum Learning

Via

Access Paper or Ask Questions

Joint Learning for Pulmonary Nodule Segmentation, Attributes and Malignancy Prediction

Feb 10, 2018

Botong Wu, Zhen Zhou, Jianwei Wang, Yizhou Wang

Figure 1 for Joint Learning for Pulmonary Nodule Segmentation, Attributes and Malignancy Prediction

Figure 2 for Joint Learning for Pulmonary Nodule Segmentation, Attributes and Malignancy Prediction

Figure 3 for Joint Learning for Pulmonary Nodule Segmentation, Attributes and Malignancy Prediction

Figure 4 for Joint Learning for Pulmonary Nodule Segmentation, Attributes and Malignancy Prediction

Abstract:Refer to the literature of lung nodule classification, many studies adopt Convolutional Neural Networks (CNN) to directly predict the malignancy of lung nodules with original thoracic Computed Tomography (CT) and nodule location. However, these studies cannot tell how the CNN works in terms of predicting the malignancy of the given nodule, e.g., it's hard to conclude that whether the region within the nodule or the contextual information matters according to the output of the CNN. In this paper, we propose an interpretable and multi-task learning CNN -- Joint learning for \textbf{P}ulmonary \textbf{N}odule \textbf{S}egmentation \textbf{A}ttributes and \textbf{M}alignancy \textbf{P}rediction (PN-SAMP). It is able to not only accurately predict the malignancy of lung nodules, but also provide semantic high-level attributes as well as the areas of detected nodules. Moreover, the combination of nodule segmentation, attributes and malignancy prediction is helpful to improve the performance of each single task. In addition, inspired by the fact that radiologists often change window widths and window centers to help to make decision on uncertain nodules, PN-SAMP mixes multiple WW/WC together to gain information for the raw CT input images. To verify the effectiveness of the proposed method, the evaluation is implemented on the public LIDC-IDRI dataset, which is one of the largest dataset for lung nodule malignancy prediction. Experiments indicate that the proposed PN-SAMP achieves significant improvement with respect to lung nodule classification, and promising performance on lung nodule segmentation and attribute learning, compared with the-state-of-the-art methods.

* 5 papers, accepted for publication in IEEE International Symposium on Biomedical Imaging (ISBI) 2018

Via

Access Paper or Ask Questions