Picture for Kai Hu

Kai Hu

Distilling Knowledge from Heterogeneous Architectures for Semantic Segmentation

Add code
Apr 10, 2025
Viaarxiv icon

UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis

Add code
Mar 20, 2025
Viaarxiv icon

DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering

Add code
Mar 20, 2025
Viaarxiv icon

M-LLM Based Video Frame Selection for Efficient Video Understanding

Add code
Feb 27, 2025
Viaarxiv icon

DeepSeek-V3 Technical Report

Add code
Dec 27, 2024
Figure 1 for DeepSeek-V3 Technical Report
Figure 2 for DeepSeek-V3 Technical Report
Figure 3 for DeepSeek-V3 Technical Report
Figure 4 for DeepSeek-V3 Technical Report
Viaarxiv icon

TravelAgent: Generative Agents in the Built Environment

Add code
Dec 25, 2024
Viaarxiv icon

Explicit Relational Reasoning Network for Scene Text Detection

Add code
Dec 19, 2024
Figure 1 for Explicit Relational Reasoning Network for Scene Text Detection
Figure 2 for Explicit Relational Reasoning Network for Scene Text Detection
Figure 3 for Explicit Relational Reasoning Network for Scene Text Detection
Figure 4 for Explicit Relational Reasoning Network for Scene Text Detection
Viaarxiv icon

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Add code
Dec 13, 2024
Figure 1 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Figure 2 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Figure 3 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Figure 4 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Viaarxiv icon

DocTabQA: Answering Questions from Long Documents Using Tables

Add code
Aug 21, 2024
Viaarxiv icon

Mutagenesis screen to map the functionals of parameters of Large Language Models

Add code
Aug 21, 2024
Viaarxiv icon