Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi Shi

Demystify Protein Generation with Hierarchical Conditional Diffusion Models

Jul 24, 2025

Zinan Ling, Yi Shi, Da Yan, Yang Zhou, Bo Hui

Abstract:Generating novel and functional protein sequences is critical to a wide range of applications in biology. Recent advancements in conditional diffusion models have shown impressive empirical performance in protein generation tasks. However, reliable generations of protein remain an open research question in de novo protein design, especially when it comes to conditional diffusion models. Considering the biological function of a protein is determined by multi-level structures, we propose a novel multi-level conditional diffusion model that integrates both sequence-based and structure-based information for efficient end-to-end protein design guided by specified functions. By generating representations at different levels simultaneously, our framework can effectively model the inherent hierarchical relations between different levels, resulting in an informative and discriminative representation of the generated protein. We also propose a Protein-MMD, a new reliable evaluation metric, to evaluate the quality of generated protein with conditional diffusion models. Our new metric is able to capture both distributional and functional similarities between real and generated protein sequences while ensuring conditional consistency. We experiment with the benchmark datasets, and the results on conditional protein generation tasks demonstrate the efficacy of the proposed generation framework and evaluation metric.

Via

Access Paper or Ask Questions

NeurNCD: Novel Class Discovery via Implicit Neural Representation

Jun 06, 2025

Junming Wang, Yi Shi

Abstract:Discovering novel classes in open-world settings is crucial for real-world applications. Traditional explicit representations, such as object descriptors or 3D segmentation maps, are constrained by their discrete, hole-prone, and noisy nature, which hinders accurate novel class discovery. To address these challenges, we introduce NeurNCD, the first versatile and data-efficient framework for novel class discovery that employs the meticulously designed Embedding-NeRF model combined with KL divergence as a substitute for traditional explicit 3D segmentation maps to aggregate semantic embedding and entropy in visual embedding space. NeurNCD also integrates several key components, including feature query, feature modulation and clustering, facilitating efficient feature augmentation and information exchange between the pre-trained semantic segmentation network and implicit neural representations. As a result, our framework achieves superior segmentation performance in both open and closed-world settings without relying on densely labelled datasets for supervised training or human interaction to generate sparse label supervision. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches on the NYUv2 and Replica datasets.

* Accepted by ICMR 2024

Via

Access Paper or Ask Questions

ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition

May 07, 2025

Zhiping Qiu, Yitong Jin, Yuan Wang, Yi Shi, Chongwu Wang, Chao Tan, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

Abstract:The art of instrument performance stands as a vivid manifestation of human creativity and emotion. Nonetheless, generating instrument performance motions is a highly challenging task, as it requires not only capturing intricate movements but also reconstructing the complex dynamics of the performer-instrument interaction. While existing works primarily focus on modeling partial body motions, we propose Expressive ceLlo performance motion Generation for Audio Rendition (ELGAR), a state-of-the-art diffusion-based framework for whole-body fine-grained instrument performance motion generation solely from audio. To emphasize the interactive nature of the instrument performance, we introduce Hand Interactive Contact Loss (HICL) and Bow Interactive Contact Loss (BICL), which effectively guarantee the authenticity of the interplay. Moreover, to better evaluate whether the generated motions align with the semantic context of the music audio, we design novel metrics specifically for string instrument performance motion generation, including finger-contact distance, bow-string distance, and bowing score. Extensive evaluations and ablation studies are conducted to validate the efficacy of the proposed methods. In addition, we put forward a motion generation dataset SPD-GEN, collated and normalized from the MoCap dataset SPD. As demonstrated, ELGAR has shown great potential in generating instrument performance motions with complicated and fast interactions, which will promote further development in areas such as animation, music education, interactive art creation, etc.

* SIGGRAPH 2025

Via

Access Paper or Ask Questions

PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers

May 06, 2025

Michael Xu, Yi Shi, KangKang Yin, Xue Bin Peng

Figure 1 for PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers

Figure 2 for PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers

Figure 3 for PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers

Figure 4 for PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers

Abstract:Humans excel in navigating diverse, complex environments with agile motor skills, exemplified by parkour practitioners performing dynamic maneuvers, such as climbing up walls and jumping across gaps. Reproducing these agile movements with simulated characters remains challenging, in part due to the scarcity of motion capture data for agile terrain traversal behaviors and the high cost of acquiring such data. In this work, we introduce PARC (Physics-based Augmentation with Reinforcement Learning for Character Controllers), a framework that leverages machine learning and physics-based simulation to iteratively augment motion datasets and expand the capabilities of terrain traversal controllers. PARC begins by training a motion generator on a small dataset consisting of core terrain traversal skills. The motion generator is then used to produce synthetic data for traversing new terrains. However, these generated motions often exhibit artifacts, such as incorrect contacts or discontinuities. To correct these artifacts, we train a physics-based tracking controller to imitate the motions in simulation. The corrected motions are then added to the dataset, which is used to continue training the motion generator in the next iteration. PARC's iterative process jointly expands the capabilities of the motion generator and tracker, creating agile and versatile models for interacting with complex environments. PARC provides an effective approach to develop controllers for agile terrain traversal, which bridges the gap between the scarcity of motion data and the need for versatile character controllers.

* SIGGRAPH Conference Papers 2025

Via

Access Paper or Ask Questions

StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data

May 06, 2025

Yuxuan Mu, Hung Yu Ling, Yi Shi, Ismael Baira Ojeda, Pengcheng Xi, Chang Shu, Fabio Zinno, Xue Bin Peng

Figure 1 for StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data

Figure 2 for StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data

Figure 3 for StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data

Figure 4 for StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data

Abstract:Motion capture (mocap) data often exhibits visually jarring artifacts due to inaccurate sensors and post-processing. Cleaning this corrupted data can require substantial manual effort from human experts, which can be a costly and time-consuming process. Previous data-driven motion cleanup methods offer the promise of automating this cleanup process, but often require in-domain paired corrupted-to-clean training data. Constructing such paired datasets requires access to high-quality, relatively artifact-free motion clips, which often necessitates laborious manual cleanup. In this work, we present StableMotion, a simple yet effective method for training motion cleanup models directly from unpaired corrupted datasets that need cleanup. The core component of our method is the introduction of motion quality indicators, which can be easily annotated through manual labeling or heuristic algorithms and enable training of quality-aware motion generation models on raw motion data with mixed quality. At test time, the model can be prompted to generate high-quality motions using the quality indicators. Our method can be implemented through a simple diffusion-based framework, leading to a unified motion generate-discriminate model, which can be used to both identify and fix corrupted frames. We demonstrate that our proposed method is effective for training motion cleanup models on raw mocap data in production scenarios by applying StableMotion to SoccerMocap, a 245-hour soccer mocap dataset containing real-world motion artifacts. The trained model effectively corrects a wide range of motion artifacts, reducing motion pops and frozen frames by 68% and 81%, respectively. See https://youtu.be/3Y7MMAH02B4 for more results.

* 17 pages, 13 figures

Via

Access Paper or Ask Questions

SpeHeatal: A Cluster-Enhanced Segmentation Method for Sperm Morphology Analysis

Feb 18, 2025

Yi Shi, Yunkai Wang, Xupeng Tian, Tieyi Zhang, Bing Yao, Hui Wang, Yong Shao, Cencen Wang, Rong Zeng

Abstract:The accurate assessment of sperm morphology is crucial in andrological diagnostics, where the segmentation of sperm images presents significant challenges. Existing approaches frequently rely on large annotated datasets and often struggle with the segmentation of overlapping sperm and the presence of dye impurities. To address these challenges, this paper first analyzes the issue of overlapping sperm tails from a geometric perspective and introduces a novel clustering algorithm, Con2Dis, which effectively segments overlapping tails by considering three essential factors: CONnectivity, CONformity, and DIStance. Building on this foundation, we propose an unsupervised method, SpeHeatal, designed for the comprehensive segmentation of the SPErm HEAd and TAiL. SpeHeatal employs the Segment Anything Model(SAM) to generate masks for sperm heads while filtering out dye impurities, utilizes Con2Dis to segment tails, and then applies a tailored mask splicing technique to produce complete sperm masks. Experimental results underscore the superior performance of SpeHeatal, particularly in handling images with overlapping sperm.

* AAAI2025

Via

Access Paper or Ask Questions

Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy

Feb 17, 2025

Zhenyuan Guo, Yi Shi, Wenlong Meng, Chen Gong, Chengkun Wei, Wenzhi Chen

Figure 1 for Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy

Figure 2 for Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy

Figure 3 for Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy

Figure 4 for Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy

Abstract:Model merging is a widespread technology in large language models (LLMs) that integrates multiple task-specific LLMs into a unified one, enabling the merged model to inherit the specialized capabilities of these LLMs. Most task-specific LLMs are sourced from open-source communities and have not undergone rigorous auditing, potentially imposing risks in model merging. This paper highlights an overlooked privacy risk: \textit{an unsafe model could compromise the privacy of other LLMs involved in the model merging.} Specifically, we propose PhiMM, a privacy attack approach that trains a phishing model capable of stealing privacy using a crafted privacy phishing instruction dataset. Furthermore, we introduce a novel model cloaking method that mimics a specialized capability to conceal attack intent, luring users into merging the phishing model. Once victims merge the phishing model, the attacker can extract personally identifiable information (PII) or infer membership information (MI) by querying the merged model with the phishing instruction. Experimental results show that merging a phishing model increases the risk of privacy breaches. Compared to the results before merging, PII leakage increased by 3.9\% and MI leakage increased by 17.4\% on average. We release the code of PhiMM through a link.

Via

Access Paper or Ask Questions

Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles

Aug 20, 2024

Zi-Xiang Xia, Sudeep Fadadu, Yi Shi, Louis Foucard

Figure 1 for Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles

Figure 2 for Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles

Figure 3 for Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles

Figure 4 for Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles

Abstract:Advances in machine learning algorithms for sensor fusion have significantly improved the detection and prediction of other road users, thereby enhancing safety. However, even a small angular displacement in the sensor's placement can cause significant degradation in output, especially at long range. In this paper, we demonstrate a simple yet generic and efficient multi-task learning approach that not only detects misalignment between different sensor modalities but is also robust against them for long-range perception. Along with the amount of misalignment, our method also predicts calibrated uncertainty, which can be useful for filtering and fusing predicted misalignment values over time. In addition, we show that the predicted misalignment parameters can be used for self-correcting input sensor data, further improving the perception performance under sensor misalignment.

Via

Access Paper or Ask Questions

CS3: Cascade SAM for Sperm Segmentation

Jul 04, 2024

Yi Shi, Xu-Peng Tian, Yun-Kai Wang, Tie-Yi Zhang, Bin Yao, Hui Wang, Yong Shao, Cen-Cen Wang, Rong Zeng, De-Chuan Zhan

Abstract:Automated sperm morphology analysis plays a crucial role in the assessment of male fertility, yet its efficacy is often compromised by the challenges in accurately segmenting sperm images. Existing segmentation techniques, including the Segment Anything Model(SAM), are notably inadequate in addressing the complex issue of sperm overlap-a frequent occurrence in clinical samples. Our exploratory studies reveal that modifying image characteristics by removing sperm heads and easily segmentable areas, alongside enhancing the visibility of overlapping regions, markedly enhances SAM's efficiency in segmenting intricate sperm structures. Motivated by these findings, we present the Cascade SAM for Sperm Segmentation (CS3), an unsupervised approach specifically designed to tackle the issue of sperm overlap. This method employs a cascade application of SAM to segment sperm heads, simple tails, and complex tails in stages. Subsequently, these segmented masks are meticulously matched and joined to construct complete sperm masks. In collaboration with leading medical institutions, we have compiled a dataset comprising approximately 2,000 unlabeled sperm images to fine-tune our method, and secured expert annotations for an additional 240 images to facilitate comprehensive model assessment. Experimental results demonstrate superior performance of CS3 compared to existing methods.

Via

Access Paper or Ask Questions

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Jun 07, 2024

Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao(+15 more)

Figure 1 for Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Figure 2 for Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Figure 3 for Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Figure 4 for Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Abstract:The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the training tasks. The inability to quickly identify the faulty components results in a substantial waste of GPU resources. Secondly, since GPUs must wait for parameter synchronization to complete before proceeding to the next round of computation, network congestions can greatly increase the waiting time for GPUs. To address these challenges, this paper introduces a communication-driven solution, namely the C4. The key insights of C4 are two folds. First, in parallel training, collective communication exhibits periodic and homogeneous characteristics, so any anomalies are certainly due to some form of hardware malfunction. By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection. Second, the predictable communication model of collective communication, involving few large flows, allows C4 to efficiently execute traffic planning, substantially reducing network congestion. C4 has been extensively implemented across our production systems, cutting error-induced overhead by roughly 30% and enhancing runtime performance by about 15% for certain applications with moderate communication costs.

Via

Access Paper or Ask Questions