Picture for Bo Peng

Bo Peng

Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data

Add code
Oct 22, 2024
Figure 1 for Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Figure 2 for Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Figure 3 for Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Figure 4 for Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Viaarxiv icon

SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search

Add code
Oct 12, 2024
Viaarxiv icon

Dark Miner: Defend against unsafe generation for text-to-image diffusion models

Add code
Sep 26, 2024
Viaarxiv icon

FSL-LVLM: Friction-Aware Safety Locomotion using Large Vision Language Model in Wheeled Robots

Add code
Sep 15, 2024
Figure 1 for FSL-LVLM: Friction-Aware Safety Locomotion using Large Vision Language Model in Wheeled Robots
Figure 2 for FSL-LVLM: Friction-Aware Safety Locomotion using Large Vision Language Model in Wheeled Robots
Figure 3 for FSL-LVLM: Friction-Aware Safety Locomotion using Large Vision Language Model in Wheeled Robots
Figure 4 for FSL-LVLM: Friction-Aware Safety Locomotion using Large Vision Language Model in Wheeled Robots
Viaarxiv icon

Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM

Add code
Sep 14, 2024
Viaarxiv icon

VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis

Add code
Sep 12, 2024
Figure 1 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Figure 2 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Figure 3 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Figure 4 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Viaarxiv icon

CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP

Add code
Aug 27, 2024
Viaarxiv icon

S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis

Add code
Aug 18, 2024
Figure 1 for S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis
Figure 2 for S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis
Figure 3 for S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis
Figure 4 for S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis
Viaarxiv icon

SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis

Add code
Aug 14, 2024
Figure 1 for SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis
Figure 2 for SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis
Figure 3 for SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis
Figure 4 for SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis
Viaarxiv icon

TextIM: Part-aware Interactive Motion Synthesis from Text

Add code
Aug 06, 2024
Viaarxiv icon