Picture for Xin Li

Xin Li

College of Business, City University of Hong Kong, Hong Kong, China

Large Language Models for Knowledge Graph Embedding Techniques, Methods, and Challenges: A Survey

Add code
Jan 14, 2025
Viaarxiv icon

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Add code
Jan 09, 2025
Figure 1 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 2 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 3 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 4 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Viaarxiv icon

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Add code
Jan 08, 2025
Viaarxiv icon

Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents

Add code
Jan 03, 2025
Viaarxiv icon

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Add code
Jan 03, 2025
Figure 1 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Figure 2 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Figure 3 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Figure 4 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Viaarxiv icon

Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner

Add code
Dec 30, 2024
Figure 1 for Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Figure 2 for Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Figure 3 for Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Figure 4 for Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Viaarxiv icon

ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval

Add code
Dec 24, 2024
Viaarxiv icon

SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction

Add code
Dec 19, 2024
Viaarxiv icon

RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification

Add code
Dec 17, 2024
Viaarxiv icon

Adversarially robust generalization theory via Jacobian regularization for deep neural networks

Add code
Dec 17, 2024
Viaarxiv icon