Picture for Qingyi Si

Qingyi Si

AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding

Add code
Mar 16, 2025
Viaarxiv icon

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

Add code
Dec 29, 2024
Viaarxiv icon

Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering

Add code
Dec 19, 2024
Viaarxiv icon

A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles

Add code
Nov 04, 2024
Figure 1 for A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles
Figure 2 for A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles
Figure 3 for A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles
Figure 4 for A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles
Viaarxiv icon

Towards Flexible Evaluation for Generative Visual Question Answering

Add code
Aug 01, 2024
Figure 1 for Towards Flexible Evaluation for Generative Visual Question Answering
Figure 2 for Towards Flexible Evaluation for Generative Visual Question Answering
Figure 3 for Towards Flexible Evaluation for Generative Visual Question Answering
Figure 4 for Towards Flexible Evaluation for Generative Visual Question Answering
Viaarxiv icon

Multimodal Table Understanding

Add code
Jun 12, 2024
Figure 1 for Multimodal Table Understanding
Figure 2 for Multimodal Table Understanding
Figure 3 for Multimodal Table Understanding
Figure 4 for Multimodal Table Understanding
Viaarxiv icon

Think out Loud: Emotion Deducing Explanation in Dialogues

Add code
Jun 07, 2024
Viaarxiv icon

Are Large Language Models Table-based Fact-Checkers?

Add code
Feb 04, 2024
Viaarxiv icon

Towards Unified Interactive Visual Grounding in The Wild

Add code
Jan 30, 2024
Figure 1 for Towards Unified Interactive Visual Grounding in The Wild
Figure 2 for Towards Unified Interactive Visual Grounding in The Wild
Figure 3 for Towards Unified Interactive Visual Grounding in The Wild
Figure 4 for Towards Unified Interactive Visual Grounding in The Wild
Viaarxiv icon

An Empirical Study of Instruction-tuning Large Language Models in Chinese

Add code
Oct 20, 2023
Figure 1 for An Empirical Study of Instruction-tuning Large Language Models in Chinese
Figure 2 for An Empirical Study of Instruction-tuning Large Language Models in Chinese
Figure 3 for An Empirical Study of Instruction-tuning Large Language Models in Chinese
Figure 4 for An Empirical Study of Instruction-tuning Large Language Models in Chinese
Viaarxiv icon