Picture for Alexander G. Hauptmann

Alexander G. Hauptmann

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Add code
Aug 21, 2024
Viaarxiv icon

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

Add code
Jun 28, 2024
Viaarxiv icon

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

Add code
Jun 27, 2024
Figure 1 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 2 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 3 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 4 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Viaarxiv icon

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis

Add code
Apr 29, 2024
Figure 1 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Figure 2 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Figure 3 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Figure 4 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Viaarxiv icon

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Add code
Oct 09, 2023
Viaarxiv icon

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Add code
Jul 03, 2023
Viaarxiv icon

Document Entity Retrieval with Massive and Noisy Pre-training

Add code
Jun 15, 2023
Viaarxiv icon

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

Add code
Apr 05, 2023
Viaarxiv icon

MAGVIT: Masked Generative Video Transformer

Add code
Dec 10, 2022
Viaarxiv icon

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

Add code
Jun 10, 2022
Figure 1 for Rethinking Spatial Invariance of Convolutional Networks for Object Counting
Figure 2 for Rethinking Spatial Invariance of Convolutional Networks for Object Counting
Figure 3 for Rethinking Spatial Invariance of Convolutional Networks for Object Counting
Figure 4 for Rethinking Spatial Invariance of Convolutional Networks for Object Counting
Viaarxiv icon