Picture for Jihao Liu

Jihao Liu

StreamChat: Chatting with Streaming Video

Add code
Dec 11, 2024
Viaarxiv icon

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

Add code
Jun 28, 2024
Figure 1 for MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Figure 2 for MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Figure 3 for MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Figure 4 for MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Viaarxiv icon

Instruction-Guided Visual Masking

Add code
May 30, 2024
Figure 1 for Instruction-Guided Visual Masking
Figure 2 for Instruction-Guided Visual Masking
Figure 3 for Instruction-Guided Visual Masking
Figure 4 for Instruction-Guided Visual Masking
Viaarxiv icon

Enhancing Vision-Language Model with Unmasked Token Alignment

Add code
May 29, 2024
Viaarxiv icon

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

Add code
Apr 11, 2024
Figure 1 for GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Figure 2 for GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Figure 3 for GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Figure 4 for GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Viaarxiv icon

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

Add code
Feb 28, 2024
Viaarxiv icon

Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

Add code
Mar 20, 2023
Viaarxiv icon

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

Add code
Jul 18, 2022
Figure 1 for TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
Figure 2 for TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
Figure 3 for TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
Figure 4 for TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
Viaarxiv icon

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

Add code
Jul 12, 2022
Figure 1 for UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Figure 2 for UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Figure 3 for UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Figure 4 for UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Viaarxiv icon

MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning

Add code
May 28, 2022
Figure 1 for MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning
Figure 2 for MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning
Figure 3 for MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning
Figure 4 for MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning
Viaarxiv icon