Picture for Huiqiang Jiang

Huiqiang Jiang

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Add code
Dec 13, 2024
Viaarxiv icon

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Add code
Sep 16, 2024
Viaarxiv icon

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Add code
Jul 02, 2024
Figure 1 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 2 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 3 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 4 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Viaarxiv icon

Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

Add code
Jun 04, 2024
Viaarxiv icon

Position Engineering: Boosting Large Language Models through Positional Information Manipulation

Add code
Apr 17, 2024
Figure 1 for Position Engineering: Boosting Large Language Models through Positional Information Manipulation
Figure 2 for Position Engineering: Boosting Large Language Models through Positional Information Manipulation
Figure 3 for Position Engineering: Boosting Large Language Models through Positional Information Manipulation
Figure 4 for Position Engineering: Boosting Large Language Models through Positional Information Manipulation
Viaarxiv icon

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Add code
Mar 19, 2024
Figure 1 for LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Figure 2 for LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Figure 3 for LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Figure 4 for LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Viaarxiv icon

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Add code
Oct 10, 2023
Viaarxiv icon

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Add code
Oct 09, 2023
Viaarxiv icon

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

Add code
Jun 05, 2023
Figure 1 for End-to-End Word-Level Pronunciation Assessment with MASK Pre-training
Figure 2 for End-to-End Word-Level Pronunciation Assessment with MASK Pre-training
Figure 3 for End-to-End Word-Level Pronunciation Assessment with MASK Pre-training
Figure 4 for End-to-End Word-Level Pronunciation Assessment with MASK Pre-training
Viaarxiv icon

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Add code
May 31, 2023
Figure 1 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 2 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 3 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 4 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Viaarxiv icon