Picture for Zhiwei Jia

Zhiwei Jia

Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward

Add code
Nov 22, 2024
Viaarxiv icon

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

Add code
May 28, 2023
Viaarxiv icon

Chain-of-Thought Predictive Control

Add code
Apr 03, 2023
Figure 1 for Chain-of-Thought Predictive Control
Figure 2 for Chain-of-Thought Predictive Control
Figure 3 for Chain-of-Thought Predictive Control
Figure 4 for Chain-of-Thought Predictive Control
Viaarxiv icon

MetaCLUE: Towards Comprehensive Visual Metaphors Research

Add code
Dec 19, 2022
Figure 1 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 2 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 3 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 4 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Viaarxiv icon

Improving Policy Optimization with Generalist-Specialist Learning

Add code
Jun 26, 2022
Figure 1 for Improving Policy Optimization with Generalist-Specialist Learning
Figure 2 for Improving Policy Optimization with Generalist-Specialist Learning
Figure 3 for Improving Policy Optimization with Generalist-Specialist Learning
Figure 4 for Improving Policy Optimization with Generalist-Specialist Learning
Viaarxiv icon

Learning to Act with Affordance-Aware Multimodal Neural SLAM

Add code
Feb 04, 2022
Figure 1 for Learning to Act with Affordance-Aware Multimodal Neural SLAM
Figure 2 for Learning to Act with Affordance-Aware Multimodal Neural SLAM
Figure 3 for Learning to Act with Affordance-Aware Multimodal Neural SLAM
Figure 4 for Learning to Act with Affordance-Aware Multimodal Neural SLAM
Viaarxiv icon

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

Add code
Nov 16, 2021
Figure 1 for TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Figure 2 for TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Figure 3 for TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Figure 4 for TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Viaarxiv icon

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

Add code
Nov 10, 2021
Figure 1 for LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Figure 2 for LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Figure 3 for LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Figure 4 for LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Viaarxiv icon

IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition

Add code
Aug 13, 2021
Figure 1 for IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition
Figure 2 for IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition
Figure 3 for IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition
Figure 4 for IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition
Viaarxiv icon

ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills

Add code
Aug 09, 2021
Figure 1 for ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills
Figure 2 for ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills
Figure 3 for ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills
Figure 4 for ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills
Viaarxiv icon