Picture for Haotian Tang

Haotian Tang

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

Add code
Oct 15, 2024
Viaarxiv icon

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Add code
Oct 14, 2024
Figure 1 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 2 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 3 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Figure 4 for DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Viaarxiv icon

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Add code
Oct 14, 2024
Figure 1 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Figure 2 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Figure 3 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Figure 4 for Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Viaarxiv icon

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Add code
Oct 14, 2024
Figure 1 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 2 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 3 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Figure 4 for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Viaarxiv icon

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Add code
Sep 06, 2024
Figure 1 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 2 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 3 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 4 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Viaarxiv icon

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Add code
Aug 21, 2024
Figure 1 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 2 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 3 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 4 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Viaarxiv icon

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

Add code
Jul 26, 2024
Viaarxiv icon

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Add code
May 07, 2024
Viaarxiv icon

MoST: Multi-modality Scene Tokenization for Motion Prediction

Add code
Apr 30, 2024
Figure 1 for MoST: Multi-modality Scene Tokenization for Motion Prediction
Figure 2 for MoST: Multi-modality Scene Tokenization for Motion Prediction
Figure 3 for MoST: Multi-modality Scene Tokenization for Motion Prediction
Figure 4 for MoST: Multi-modality Scene Tokenization for Motion Prediction
Viaarxiv icon

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Add code
Sep 21, 2023
Figure 1 for LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Figure 2 for LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Figure 3 for LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Figure 4 for LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Viaarxiv icon