Picture for Xing Sun

Xing Sun

LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?

Add code
Mar 10, 2025
Viaarxiv icon

Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving

Add code
Mar 09, 2025
Viaarxiv icon

RocketEval: Efficient Automated LLM Evaluation via Grading Checklist

Add code
Mar 07, 2025
Viaarxiv icon

FlowAgent: Achieving Compliance and Flexibility for Workflow Agents

Add code
Feb 20, 2025
Viaarxiv icon

RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following

Add code
Feb 17, 2025
Viaarxiv icon

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

Add code
Feb 07, 2025
Viaarxiv icon

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Add code
Jan 27, 2025
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

Probability-density-aware Semi-supervised Learning

Add code
Dec 23, 2024
Viaarxiv icon

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

Add code
Dec 02, 2024
Figure 1 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 2 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 3 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 4 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Viaarxiv icon