Picture for Yizhou Zhou

Yizhou Zhou

Number it: Temporal Grounding Videos like Flipping Manga

Add code
Nov 15, 2024
Viaarxiv icon

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Add code
Oct 15, 2024
Viaarxiv icon

EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model

Add code
Aug 21, 2024
Viaarxiv icon

Enhancing Explainability of Knowledge Learning Paths: Causal Knowledge Networks

Add code
Jun 26, 2024
Viaarxiv icon

Visual Perception by Large Language Model's Weights

Add code
May 30, 2024
Viaarxiv icon

Multi-Modal Generative Embedding Model

Add code
May 29, 2024
Viaarxiv icon

ReGenNet: Towards Human Action-Reaction Synthesis

Add code
Mar 18, 2024
Viaarxiv icon

Inter-X: Towards Versatile Human-Human Interaction Analysis

Add code
Dec 26, 2023
Figure 1 for Inter-X: Towards Versatile Human-Human Interaction Analysis
Figure 2 for Inter-X: Towards Versatile Human-Human Interaction Analysis
Figure 3 for Inter-X: Towards Versatile Human-Human Interaction Analysis
Figure 4 for Inter-X: Towards Versatile Human-Human Interaction Analysis
Viaarxiv icon

Text-Only Image Captioning with Multi-Context Data Generation

Add code
May 29, 2023
Viaarxiv icon

Unsupervised Visual Representation Learning by Tracking Patches in Video

Add code
May 06, 2021
Figure 1 for Unsupervised Visual Representation Learning by Tracking Patches in Video
Figure 2 for Unsupervised Visual Representation Learning by Tracking Patches in Video
Figure 3 for Unsupervised Visual Representation Learning by Tracking Patches in Video
Figure 4 for Unsupervised Visual Representation Learning by Tracking Patches in Video
Viaarxiv icon