Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuchen Sun

MAGI-1: Autoregressive Video Generation at Scale

May 19, 2025

Sand. ai, Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, W. Q. Zhang(+29 more)

Abstract:We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai.

Via

Access Paper or Ask Questions

EDGE: Unknown-aware Multi-label Learning by Energy Distribution Gap Expansion

Dec 10, 2024

Yuchen Sun, Qianqian Xu, Zitai Wang, Zhiyong Yang, Junwei He

Abstract:Multi-label Out-Of-Distribution (OOD) detection aims to discriminate the OOD samples from the multi-label In-Distribution (ID) ones. Compared with its multiclass counterpart, it is crucial to model the joint information among classes. To this end, JointEnergy, which is a representative multi-label OOD inference criterion, summarizes the logits of all the classes. However, we find that JointEnergy can produce an imbalance problem in OOD detection, especially when the model lacks enough discrimination ability. Specifically, we find that the samples only related to minority classes tend to be classified as OOD samples due to the ambiguous energy decision boundary. Besides, imbalanced multi-label learning methods, originally designed for ID ones, would not be suitable for OOD detection scenarios, even producing a serious negative transfer effect. In this paper, we resort to auxiliary outlier exposure (OE) and propose an unknown-aware multi-label learning framework to reshape the uncertainty energy space layout. In this framework, the energy score is separately optimized for tail ID samples and unknown samples, and the energy distribution gap between them is expanded, such that the tail ID samples can have a significantly larger energy score than the OOD ones. What's more, a simple yet effective measure is designed to select more informative OE datasets. Finally, comprehensive experimental results on multiple multi-label and OOD datasets reveal the effectiveness of the proposed method.

* 9 pages, 5 figures, accepted by AAAI 2025

Via

Access Paper or Ask Questions

Real-time Dynamics of Soft Manipulators with Cross-section Inflation: Application to the Octopus Muscular Hydrostat

Dec 04, 2024

Yuchen Sun, Anup Teejo Mathew, Imran Afgan, Federico Renda, Cecilia Laschi

Abstract:Inspired by the embodied intelligence of biological creatures like the octopus, the soft robotic arm utilizes its highly flexible structure to perform various tasks in the complex environment. While the classic Cosserat rod theory investigates the bending, twisting, shearing, and stretching of the soft arm, it fails to capture the in-plane deformation that occurs during certain tasks, particularly those involving active lateral traction. This paper introduces an extended Cosserat rod theory addressing these limitations by incorporating an extra strain variable reflecting the in-plane inflation ratio. To accurately describe the viscoelasticity effect of the soft body in dynamics, the proposed model enhances the constitutive law by integrating the Saint-Venant Kirchhoff hyperelastic and Kelvin-Voigt viscous models. The active and environmental loads are accounted for the equations of motion, which are numerically solved by adapting the Geometric Variable Strain (GVS) approach to balance the accuracy and computational efficiency. Our contributions include the derivation of the extended Cosserat rod theory in dynamic context, and the development of a reduced-order numerical method that enables rapid and precise solutions. We demonstrate applications of the model in stiffness tuning of a soft robotic arm and the study of complex octopus' arm motions.

Via

Access Paper or Ask Questions

Neutralizing Backdoors through Information Conflicts for Large Language Models

Nov 27, 2024

Chen Chen, Yuchen Sun, Xueluan Gong, Jiaxin Gao, Kwok-Yan Lam

Figure 1 for Neutralizing Backdoors through Information Conflicts for Large Language Models

Figure 2 for Neutralizing Backdoors through Information Conflicts for Large Language Models

Figure 3 for Neutralizing Backdoors through Information Conflicts for Large Language Models

Figure 4 for Neutralizing Backdoors through Information Conflicts for Large Language Models

Abstract:Large language models (LLMs) have seen significant advancements, achieving superior performance in various Natural Language Processing (NLP) tasks, from understanding to reasoning. However, they remain vulnerable to backdoor attacks, where models behave normally for standard queries but generate harmful responses or unintended output when specific triggers are activated. Existing backdoor defenses often suffer from drawbacks that they either focus on detection without removal, rely on rigid assumptions about trigger properties, or prove to be ineffective against advanced attacks like multi-trigger backdoors. In this paper, we present a novel method to eliminate backdoor behaviors from LLMs through the construction of information conflicts using both internal and external mechanisms. Internally, we leverage a lightweight dataset to train a conflict model, which is then merged with the backdoored model to neutralize malicious behaviors by embedding contradictory information within the model's parametric memory. Externally, we incorporate convincing contradictory evidence into the prompt to challenge the model's internal backdoor knowledge. Experimental results on classification and conversational tasks across 4 widely used LLMs demonstrate that our method outperforms 8 state-of-the-art backdoor defense baselines. We can reduce the attack success rate of advanced backdoor attacks by up to 98% while maintaining over 90% clean data accuracy. Furthermore, our method has proven to be robust against adaptive backdoor attacks. The code will be open-sourced upon publication.

Via

Access Paper or Ask Questions

HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection

Jul 31, 2024

Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, Yuchen Sun, Qingming Huang

Abstract:With the progressive advancements in deep graph learning, out-of-distribution (OOD) detection for graph data has emerged as a critical challenge. While the efficacy of auxiliary datasets in enhancing OOD detection has been extensively studied for image and text data, such approaches have not yet been explored for graph data. Unlike Euclidean data, graph data exhibits greater diversity but lower robustness to perturbations, complicating the integration of outliers. To tackle these challenges, we propose the introduction of \textbf{H}ybrid External and Internal \textbf{G}raph \textbf{O}utlier \textbf{E}xposure (HGOE) to improve graph OOD detection performance. Our framework involves using realistic external graph data from various domains and synthesizing internal outliers within ID subgroups to address the poor robustness and presence of OOD samples within the ID class. Furthermore, we develop a boundary-aware OE loss that adaptively assigns weights to outliers, maximizing the use of high-quality OOD samples while minimizing the impact of low-quality ones. Our proposed HGOE framework is model-agnostic and designed to enhance the effectiveness of existing graph OOD detection models. Experimental results demonstrate that our HGOE framework can significantly improve the performance of existing OOD detection models across all 8 real datasets.

* Proceedings of the 32nd ACM International Conference on Multimedia

Via

Access Paper or Ask Questions

Neural Fluidic System Design and Control with Differentiable Simulation

May 22, 2024

Yifei Li, Yuchen Sun, Pingchuan Ma, Eftychios Sifakis, Tao Du, Bo Zhu, Wojciech Matusik

Figure 1 for Neural Fluidic System Design and Control with Differentiable Simulation

Figure 2 for Neural Fluidic System Design and Control with Differentiable Simulation

Figure 3 for Neural Fluidic System Design and Control with Differentiable Simulation

Figure 4 for Neural Fluidic System Design and Control with Differentiable Simulation

Abstract:We present a novel framework to explore neural control and design of complex fluidic systems with dynamic solid boundaries. Our system features a fast differentiable Navier-Stokes solver with solid-fluid interface handling, a low-dimensional differentiable parametric geometry representation, a control-shape co-design algorithm, and gym-like simulation environments to facilitate various fluidic control design applications. Additionally, we present a benchmark of design, control, and learning tasks on high-fidelity, high-resolution dynamic fluid environments that pose challenges for existing differentiable fluid simulators. These tasks include designing the control of artificial hearts, identifying robotic end-effector shapes, and controlling a fluid gate. By seamlessly incorporating our differentiable fluid simulator into a learning framework, we demonstrate successful design, control, and learning results that surpass gradient-free solutions in these benchmark tasks.

Via

Access Paper or Ask Questions

Deep Intellectual Property: A Survey

Apr 28, 2023

Yuchen Sun, Tianpeng Liu, Panhe Hu, Qing Liao, Shouling Ji, Nenghai Yu, Deke Guo, Li Liu

Abstract:With the widespread application in industrial manufacturing and commercial services, well-trained deep neural networks (DNNs) are becoming increasingly valuable and crucial assets due to the tremendous training cost and excellent generalization performance. These trained models can be utilized by users without much expert knowledge benefiting from the emerging ''Machine Learning as a Service'' (MLaaS) paradigm. However, this paradigm also exposes the expensive models to various potential threats like model stealing and abuse. As an urgent requirement to defend against these threats, Deep Intellectual Property (DeepIP), to protect private training data, painstakingly-tuned hyperparameters, or costly learned model weights, has been the consensus of both industry and academia. To this end, numerous approaches have been proposed to achieve this goal in recent years, especially to prevent or discover model stealing and unauthorized redistribution. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field. More than 190 research contributions are included in this survey, covering many aspects of Deep IP Protection: challenges/threats, invasive solutions (watermarking), non-invasive solutions (fingerprinting), evaluation metrics, and performance. We finish the survey by identifying promising directions for future research.

* 38 pages, 12 figures

Via

Access Paper or Ask Questions

WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments

Oct 14, 2022

Xi Chen, Tianyu Shi, Qingpeng Zhao, Yuchen Sun, Yunfei Gao, Xiangjun Wang

Figure 1 for WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments

Figure 2 for WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments

Figure 3 for WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments

Figure 4 for WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments

Abstract:Recent advances in deep reinforcement learning (RL) have demonstrated complex decision-making capabilities in simulation environments such as Arcade Learning Environment, MuJoCo, and ViZDoom. However, they are hardly extensible to more complicated problems, mainly due to the lack of complexity and variations in the environments they are trained and tested on. Furthermore, they are not extensible to an open-world environment to facilitate long-term exploration research. To learn realistic task-solving capabilities, we need to develop an environment with greater diversity and complexity. We developed WILD-SCAV, a powerful and extensible environment based on a 3D open-world FPS (First-Person Shooter) game to bridge the gap. It provides realistic 3D environments of variable complexity, various tasks, and multiple modes of interaction, where agents can learn to perceive 3D environments, navigate and plan, compete and cooperate in a human-like manner. WILD-SCAV also supports different complexities, such as configurable maps with different terrains, building structures and distributions, and multi-agent settings with cooperative and competitive tasks. The experimental results on configurable complexity, multi-tasking, and multi-agent scenarios demonstrate the effectiveness of WILD-SCAV in benchmarking various RL algorithms, as well as it is potential to give rise to intelligent agents with generalized task-solving abilities. The link to our open-sourced code can be found here https://github.com/inspirai/wilderness-scavenger.

Via

Access Paper or Ask Questions

Can Graph Neural Networks Learn to Solve MaxSAT Problem?

Nov 15, 2021

Minghao Liu, Fuqi Jia, Pei Huang, Fan Zhang, Yuchen Sun, Shaowei Cai, Feifei Ma, Jian Zhang

Figure 1 for Can Graph Neural Networks Learn to Solve MaxSAT Problem?

Figure 2 for Can Graph Neural Networks Learn to Solve MaxSAT Problem?

Figure 3 for Can Graph Neural Networks Learn to Solve MaxSAT Problem?

Figure 4 for Can Graph Neural Networks Learn to Solve MaxSAT Problem?

Abstract:With the rapid development of deep learning techniques, various recent work has tried to apply graph neural networks (GNNs) to solve NP-hard problems such as Boolean Satisfiability (SAT), which shows the potential in bridging the gap between machine learning and symbolic reasoning. However, the quality of solutions predicted by GNNs has not been well investigated in the literature. In this paper, we study the capability of GNNs in learning to solve Maximum Satisfiability (MaxSAT) problem, both from theoretical and practical perspectives. We build two kinds of GNN models to learn the solution of MaxSAT instances from benchmarks, and show that GNNs have attractive potential to solve MaxSAT problem through experimental evaluation. We also present a theoretical explanation of the effect that GNNs can learn to solve MaxSAT problem to some extent for the first time, based on the algorithmic alignment theory.

Via

Access Paper or Ask Questions

An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Dec 29, 2020

Yuchen Sun, Dongchun Ren, Shiqi Lian, Mingyu Fan, Xiangyi Teng

Figure 1 for An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Figure 2 for An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Figure 3 for An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Figure 4 for An Efficient Generation Method based on Dynamic Curvature of the Reference Curve for Robust Trajectory Planning

Abstract:Trajectory planning is a fundamental task on various autonomous driving platforms, such as social robotics and self-driving cars. Many trajectory planning algorithms use a reference curve based Frenet frame with time to reduce the planning dimension. However, there is a common implicit assumption in classic trajectory planning approaches, which is that the generated trajectory should follow the reference curve continuously. This assumption is not always true in real applications and it might cause some undesired issues in planning. One issue is that the projection of the planned trajectory onto the reference curve maybe discontinuous. Then, some segments on the reference curve are not the image of any part of the planned path. Another issue is that the planned path might self-intersect when following a simple reference curve continuously. The generated trajectories are unnatural and suboptimal ones when these issues happen. In this paper, we firstly demonstrate these issues and then introduce an efficient trajectory generation method which uses a new transformation from the Cartesian frame to Frenet frames. Experimental results on a simulated street scenario demonstrated the effectiveness of the proposed method.

* no comments

Via

Access Paper or Ask Questions