Picture for Jun-Yan He

Jun-Yan He

GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts

Add code
Nov 18, 2024
Viaarxiv icon

POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search

Add code
Oct 15, 2024
Viaarxiv icon

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

Add code
Jun 28, 2024
Viaarxiv icon

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

Add code
Jun 27, 2024
Figure 1 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 2 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 3 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 4 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Viaarxiv icon

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Add code
Jun 17, 2024
Figure 1 for Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Figure 2 for Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Figure 3 for Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Figure 4 for Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Viaarxiv icon

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis

Add code
Apr 29, 2024
Figure 1 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Figure 2 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Figure 3 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Figure 4 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Viaarxiv icon

Exploring Dynamic Transformer for Efficient Object Tracking

Add code
Mar 26, 2024
Viaarxiv icon

DyRoNet: A Low-Rank Adapter Enhanced Dynamic Routing Network for Streaming Perception

Add code
Mar 15, 2024
Viaarxiv icon

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

Add code
Mar 05, 2024
Viaarxiv icon

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope

Add code
Jan 12, 2024
Figure 1 for WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope
Figure 2 for WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope
Figure 3 for WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope
Figure 4 for WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope
Viaarxiv icon