Picture for Yi Chen

Yi Chen

Refer to the report for detailed contributions

ITPNet: Towards Instantaneous Trajectory Prediction for Autonomous Driving

Add code
Dec 10, 2024
Viaarxiv icon

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

Add code
Dec 05, 2024
Figure 1 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 2 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 3 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 4 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Viaarxiv icon

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Add code
Dec 05, 2024
Figure 1 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 2 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 3 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 4 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Viaarxiv icon

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Add code
Dec 03, 2024
Viaarxiv icon

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

Add code
Nov 25, 2024
Figure 1 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Figure 2 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Figure 3 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Figure 4 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Viaarxiv icon

Decoupling Layout from Glyph in Online Chinese Handwriting Generation

Add code
Oct 03, 2024
Viaarxiv icon

Hard-Label Cryptanalytic Extraction of Neural Network Models

Add code
Sep 18, 2024
Figure 1 for Hard-Label Cryptanalytic Extraction of Neural Network Models
Figure 2 for Hard-Label Cryptanalytic Extraction of Neural Network Models
Figure 3 for Hard-Label Cryptanalytic Extraction of Neural Network Models
Figure 4 for Hard-Label Cryptanalytic Extraction of Neural Network Models
Viaarxiv icon

Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction

Add code
Sep 02, 2024
Figure 1 for Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction
Figure 2 for Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction
Figure 3 for Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction
Figure 4 for Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction
Viaarxiv icon

Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information

Add code
Sep 02, 2024
Viaarxiv icon

Training-free Long Video Generation with Chain of Diffusion Model Experts

Add code
Aug 27, 2024
Viaarxiv icon