Picture for Wentian Zhao

Wentian Zhao

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

Add code
Nov 26, 2024
Viaarxiv icon

Quadratic Is Not What You Need For Multimodal Large Language Models

Add code
Oct 08, 2024
Viaarxiv icon

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Add code
Dec 29, 2023
Figure 1 for DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
Figure 2 for DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
Figure 3 for DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
Figure 4 for DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
Viaarxiv icon

Text2Layer: Layered Image Generation using Latent Diffusion Model

Add code
Jul 19, 2023
Viaarxiv icon

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

Add code
Jul 26, 2021
Figure 1 for Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Figure 2 for Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Figure 3 for Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Figure 4 for Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Viaarxiv icon

Video Question Answering on Screencast Tutorials

Add code
Aug 02, 2020
Figure 1 for Video Question Answering on Screencast Tutorials
Figure 2 for Video Question Answering on Screencast Tutorials
Figure 3 for Video Question Answering on Screencast Tutorials
Figure 4 for Video Question Answering on Screencast Tutorials
Viaarxiv icon

Weakly Supervised Object Localization with Inter-Intra Regulated CAMs

Add code
Nov 19, 2019
Figure 1 for Weakly Supervised Object Localization with Inter-Intra Regulated CAMs
Figure 2 for Weakly Supervised Object Localization with Inter-Intra Regulated CAMs
Figure 3 for Weakly Supervised Object Localization with Inter-Intra Regulated CAMs
Figure 4 for Weakly Supervised Object Localization with Inter-Intra Regulated CAMs
Viaarxiv icon

Weakly Supervised Localization Using Background Images

Add code
Sep 11, 2019
Figure 1 for Weakly Supervised Localization Using Background Images
Figure 2 for Weakly Supervised Localization Using Background Images
Figure 3 for Weakly Supervised Localization Using Background Images
Figure 4 for Weakly Supervised Localization Using Background Images
Viaarxiv icon

Relational Reasoning using Prior Knowledge for Visual Captioning

Add code
Jun 04, 2019
Figure 1 for Relational Reasoning using Prior Knowledge for Visual Captioning
Figure 2 for Relational Reasoning using Prior Knowledge for Visual Captioning
Figure 3 for Relational Reasoning using Prior Knowledge for Visual Captioning
Figure 4 for Relational Reasoning using Prior Knowledge for Visual Captioning
Viaarxiv icon

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos

Add code
Dec 06, 2018
Figure 1 for How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos
Figure 2 for How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos
Figure 3 for How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos
Figure 4 for How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos
Viaarxiv icon