Picture for Tong Yang

Tong Yang

Boston College

Inference-to-complete: A High-performance and Programmable Data-plane Co-processor for Neural-network-driven Traffic Analysis

Add code
Nov 01, 2024
Viaarxiv icon

Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment

Add code
Oct 28, 2024
Viaarxiv icon

BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching

Add code
Oct 24, 2024
Viaarxiv icon

LiNo: Advancing Recursive Residual Decomposition of Linear and Nonlinear Patterns for Robust Time Series Forecasting

Add code
Oct 22, 2024
Viaarxiv icon

INT-FlashAttention: Enabling Flash Attention for INT8 Quantization

Add code
Sep 26, 2024
Figure 1 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 2 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 3 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 4 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Viaarxiv icon

Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview

Add code
Sep 18, 2024
Viaarxiv icon

In-Context Learning with Representations: Contextual Generalization of Trained Transformers

Add code
Aug 19, 2024
Viaarxiv icon

An Efficient Inference Framework for Early-exit Large Language Models

Add code
Jul 25, 2024
Viaarxiv icon

HERA: High-efficiency Matrix Compression via Element Replacement

Add code
Jul 04, 2024
Viaarxiv icon

Cephalometric Landmark Detection across Ages with Prototypical Network

Add code
Jun 18, 2024
Figure 1 for Cephalometric Landmark Detection across Ages with Prototypical Network
Figure 2 for Cephalometric Landmark Detection across Ages with Prototypical Network
Figure 3 for Cephalometric Landmark Detection across Ages with Prototypical Network
Figure 4 for Cephalometric Landmark Detection across Ages with Prototypical Network
Viaarxiv icon