Picture for Xia Hu

Xia Hu

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Add code
Jan 04, 2026
Viaarxiv icon

BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding

Add code
Dec 12, 2025
Viaarxiv icon

Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning

Add code
Jun 11, 2025
Viaarxiv icon

AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models

Add code
May 28, 2025
Viaarxiv icon

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Add code
Apr 15, 2025
Figure 1 for 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Figure 2 for 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Figure 3 for 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Figure 4 for 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Viaarxiv icon

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Add code
Mar 20, 2025
Viaarxiv icon

You Only Debias Once: Towards Flexible Accuracy-Fairness Trade-offs at Inference Time

Add code
Mar 10, 2025
Viaarxiv icon

More for Keys, Less for Values: Adaptive KV Cache Quantization

Add code
Feb 20, 2025
Viaarxiv icon

Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization

Add code
Feb 06, 2025
Viaarxiv icon

Survey and Improvement Strategies for Gene Prioritization with Large Language Models

Add code
Jan 30, 2025
Figure 1 for Survey and Improvement Strategies for Gene Prioritization with Large Language Models
Figure 2 for Survey and Improvement Strategies for Gene Prioritization with Large Language Models
Figure 3 for Survey and Improvement Strategies for Gene Prioritization with Large Language Models
Viaarxiv icon