Picture for Yuxiong He

Yuxiong He

CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation

Add code
Dec 19, 2024
Figure 1 for CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation
Figure 2 for CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation
Figure 3 for CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation
Figure 4 for CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation
Viaarxiv icon

Inference Scaling for Bridging Retrieval and Augmented Generation

Add code
Dec 14, 2024
Viaarxiv icon

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation

Add code
Oct 04, 2024
Viaarxiv icon

STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning

Add code
Sep 10, 2024
Figure 1 for STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Figure 2 for STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Figure 3 for STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Figure 4 for STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Viaarxiv icon

FastPersist: Accelerating Model Checkpointing in Deep Learning

Add code
Jun 19, 2024
Figure 1 for FastPersist: Accelerating Model Checkpointing in Deep Learning
Figure 2 for FastPersist: Accelerating Model Checkpointing in Deep Learning
Figure 3 for FastPersist: Accelerating Model Checkpointing in Deep Learning
Figure 4 for FastPersist: Accelerating Model Checkpointing in Deep Learning
Viaarxiv icon

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Add code
Jan 25, 2024
Figure 1 for FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
Figure 2 for FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
Figure 3 for FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
Figure 4 for FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
Viaarxiv icon

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Add code
Jan 09, 2024
Viaarxiv icon

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Add code
Dec 18, 2023
Figure 1 for ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Figure 2 for ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Figure 3 for ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Figure 4 for ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Viaarxiv icon

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers

Add code
Oct 26, 2023
Viaarxiv icon

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Add code
Oct 11, 2023
Figure 1 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 2 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 3 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 4 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Viaarxiv icon