Picture for Minjia Zhang

Minjia Zhang

MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache

Add code
Nov 28, 2024
Figure 1 for MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Figure 2 for MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Figure 3 for MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Figure 4 for MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Viaarxiv icon

Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache

Add code
Nov 27, 2024
Figure 1 for Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Figure 2 for Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Figure 3 for Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Figure 4 for Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Viaarxiv icon

Transforming the Hybrid Cloud for Emerging AI Workloads

Add code
Nov 20, 2024
Viaarxiv icon

Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment

Add code
Nov 05, 2024
Viaarxiv icon

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions

Add code
Aug 01, 2024
Figure 1 for Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions
Figure 2 for Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions
Figure 3 for Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions
Figure 4 for Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions
Viaarxiv icon

Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

Add code
Jul 11, 2024
Viaarxiv icon

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Add code
Jul 07, 2024
Viaarxiv icon

Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training

Add code
Jun 27, 2024
Figure 1 for Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Figure 2 for Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Figure 3 for Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Figure 4 for Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Viaarxiv icon

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

Add code
Jan 17, 2024
Viaarxiv icon

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Add code
Oct 11, 2023
Figure 1 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 2 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 3 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Figure 4 for DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Viaarxiv icon