Picture for Tsu-Jui Fu

Tsu-Jui Fu

STIV: Scalable Text and Image Conditioned Video Generation

Add code
Dec 10, 2024
Viaarxiv icon

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Add code
May 29, 2024
Figure 1 for T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
Figure 2 for T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
Figure 3 for T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
Figure 4 for T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
Viaarxiv icon

From Text to Pixel: Advancing Long-Context Understanding in MLLMs

Add code
May 23, 2024
Viaarxiv icon

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Apr 11, 2024
Figure 1 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 2 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 3 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Figure 4 for Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Viaarxiv icon

Guiding Instruction-based Image Editing via Multimodal Large Language Models

Add code
Sep 29, 2023
Viaarxiv icon

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Add code
Jul 12, 2023
Viaarxiv icon

Photoswap: Personalized Subject Swapping in Images

Add code
May 29, 2023
Viaarxiv icon

Text-guided 3D Human Generation from 2D Collections

Add code
May 23, 2023
Figure 1 for Text-guided 3D Human Generation from 2D Collections
Figure 2 for Text-guided 3D Human Generation from 2D Collections
Figure 3 for Text-guided 3D Human Generation from 2D Collections
Figure 4 for Text-guided 3D Human Generation from 2D Collections
Viaarxiv icon

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

Add code
May 18, 2023
Viaarxiv icon

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Add code
May 18, 2023
Figure 1 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 2 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 3 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 4 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Viaarxiv icon