Picture for Peiyuan Zhang

Peiyuan Zhang

Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile

Add code
Feb 10, 2025
Viaarxiv icon

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

Add code
Feb 07, 2025
Viaarxiv icon

Fast Video Generation with Sliding Tile Attention

Add code
Feb 06, 2025
Viaarxiv icon

PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection

Add code
Jan 23, 2025
Figure 1 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 2 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 3 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 4 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Viaarxiv icon

Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

Add code
Dec 11, 2024
Viaarxiv icon

Temporal Reasoning Transfer from Text to Video

Add code
Oct 08, 2024
Figure 1 for Temporal Reasoning Transfer from Text to Video
Figure 2 for Temporal Reasoning Transfer from Text to Video
Figure 3 for Temporal Reasoning Transfer from Text to Video
Figure 4 for Temporal Reasoning Transfer from Text to Video
Viaarxiv icon

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Add code
Jul 17, 2024
Viaarxiv icon

Long Context Transfer from Language to Vision

Add code
Jun 24, 2024
Figure 1 for Long Context Transfer from Language to Vision
Figure 2 for Long Context Transfer from Language to Vision
Figure 3 for Long Context Transfer from Language to Vision
Figure 4 for Long Context Transfer from Language to Vision
Viaarxiv icon

TinyLlama: An Open-Source Small Language Model

Add code
Jan 04, 2024
Viaarxiv icon

OtterHD: A High-Resolution Multi-modality Model

Add code
Nov 07, 2023
Viaarxiv icon