Picture for Hao Fei

Hao Fei

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Add code
Nov 05, 2024
Viaarxiv icon

Unified Generative and Discriminative Training for Multi-modal Large Language Models

Add code
Nov 01, 2024
Viaarxiv icon

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Add code
Oct 27, 2024
Viaarxiv icon

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

Add code
Oct 20, 2024
Viaarxiv icon

A Survey of Ontology Expansion for Conversational Understanding

Add code
Oct 19, 2024
Viaarxiv icon

Grounding is All You Need? Dual Temporal Grounding for Video Dialog

Add code
Oct 08, 2024
Figure 1 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 2 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 3 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Figure 4 for Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Viaarxiv icon

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

Add code
Sep 30, 2024
Viaarxiv icon

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Add code
Aug 18, 2024
Viaarxiv icon

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models

Add code
Jul 31, 2024
Viaarxiv icon

Revisiting Structured Sentiment Analysis as Latent Dependency Graph Parsing

Add code
Jul 05, 2024
Viaarxiv icon