Picture for Mingxuan Yuan

Mingxuan Yuan

Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models

Add code
Mar 29, 2025
Viaarxiv icon

Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

Add code
Mar 26, 2025
Viaarxiv icon

Open3DBench: Open-Source Benchmark for 3D-IC Backend Implementation and PPA Evaluation

Add code
Mar 17, 2025
Viaarxiv icon

ARS: Automatic Routing Solver with Large Language Models

Add code
Feb 21, 2025
Viaarxiv icon

Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models

Add code
Feb 19, 2025
Viaarxiv icon

PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery

Add code
Feb 18, 2025
Viaarxiv icon

Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks

Add code
Feb 09, 2025
Viaarxiv icon

KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference

Add code
Feb 06, 2025
Figure 1 for KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Figure 2 for KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Figure 3 for KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Figure 4 for KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Viaarxiv icon

AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference

Add code
Feb 06, 2025
Viaarxiv icon

CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

Add code
Feb 06, 2025
Viaarxiv icon