Picture for Yuhui Xu

Yuhui Xu

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Add code
Jan 31, 2025
Figure 1 for Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Figure 2 for Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Figure 3 for Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Figure 4 for Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Viaarxiv icon

GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

Add code
Dec 15, 2024
Viaarxiv icon

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Add code
Oct 07, 2024
Figure 1 for MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Figure 2 for MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Figure 3 for MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Figure 4 for MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Viaarxiv icon

ThinK: Thinner Key Cache by Query-Driven Pruning

Add code
Jul 30, 2024
Viaarxiv icon

One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments

Add code
May 30, 2024
Figure 1 for One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments
Figure 2 for One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments
Figure 3 for One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments
Figure 4 for One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments
Viaarxiv icon

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

Add code
May 25, 2024
Figure 1 for SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Figure 2 for SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Figure 3 for SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Figure 4 for SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Viaarxiv icon

TerDiT: Ternary Diffusion Models with Transformers

Add code
May 23, 2024
Figure 1 for TerDiT: Ternary Diffusion Models with Transformers
Figure 2 for TerDiT: Ternary Diffusion Models with Transformers
Figure 3 for TerDiT: Ternary Diffusion Models with Transformers
Figure 4 for TerDiT: Ternary Diffusion Models with Transformers
Viaarxiv icon

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

Add code
Feb 22, 2024
Figure 1 for Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Figure 2 for Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Figure 3 for Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Figure 4 for Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Viaarxiv icon

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Add code
Sep 26, 2023
Figure 1 for QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Figure 2 for QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Figure 3 for QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Figure 4 for QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Viaarxiv icon

Batch Normalization with Enhanced Linear Transformation

Add code
Nov 28, 2020
Figure 1 for Batch Normalization with Enhanced Linear Transformation
Figure 2 for Batch Normalization with Enhanced Linear Transformation
Figure 3 for Batch Normalization with Enhanced Linear Transformation
Figure 4 for Batch Normalization with Enhanced Linear Transformation
Viaarxiv icon