Picture for Jiawei Wang

Jiawei Wang

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Add code
Dec 13, 2024
Viaarxiv icon

CausalMob: Causal Human Mobility Prediction with LLMs-derived Human Intentions toward Public Events

Add code
Dec 03, 2024
Viaarxiv icon

VQ-SGen: A Vector Quantized Stroke Representation for Sketch Generation

Add code
Nov 25, 2024
Viaarxiv icon

Tarsier: Recipes for Training and Evaluating Large Video Description Models

Add code
Jun 30, 2024
Viaarxiv icon

DLAFormer: An End-to-End Transformer For Document Layout Analysis

Add code
May 20, 2024
Viaarxiv icon

AMCEN: An Attention Masking-based Contrastive Event Network for Two-stage Temporal Knowledge Graph Reasoning

Add code
May 16, 2024
Viaarxiv icon

EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech

Add code
Mar 17, 2024
Figure 1 for EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Figure 2 for EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Figure 3 for EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Figure 4 for EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Viaarxiv icon

Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation

Add code
Feb 22, 2024
Viaarxiv icon

Boximator: Generating Rich and Controllable Motions for Video Synthesis

Add code
Feb 02, 2024
Figure 1 for Boximator: Generating Rich and Controllable Motions for Video Synthesis
Figure 2 for Boximator: Generating Rich and Controllable Motions for Video Synthesis
Figure 3 for Boximator: Generating Rich and Controllable Motions for Video Synthesis
Figure 4 for Boximator: Generating Rich and Controllable Motions for Video Synthesis
Viaarxiv icon

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

Add code
Jan 22, 2024
Viaarxiv icon