Picture for Jianyu Wei

Jianyu Wei

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing

Add code
Feb 03, 2026
Viaarxiv icon

MiMo-V2-Flash Technical Report

Add code
Jan 08, 2026
Viaarxiv icon

MiMo-Audio: Audio Language Models are Few-Shot Learners

Add code
Dec 29, 2025
Viaarxiv icon

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

Add code
Feb 17, 2025
Viaarxiv icon

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration

Add code
Aug 12, 2024
Viaarxiv icon

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

Add code
Jun 25, 2024
Viaarxiv icon

AFPQ: Asymmetric Floating Point Quantization for LLMs

Add code
Nov 03, 2023
Figure 1 for AFPQ: Asymmetric Floating Point Quantization for LLMs
Figure 2 for AFPQ: Asymmetric Floating Point Quantization for LLMs
Figure 3 for AFPQ: Asymmetric Floating Point Quantization for LLMs
Figure 4 for AFPQ: Asymmetric Floating Point Quantization for LLMs
Viaarxiv icon

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Add code
Aug 23, 2023
Viaarxiv icon