Picture for Xiaotian Han

Xiaotian Han

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

Add code
Jan 08, 2025
Viaarxiv icon

NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models

Add code
Dec 18, 2024
Viaarxiv icon

Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models

Add code
Nov 25, 2024
Figure 1 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Figure 2 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Figure 3 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Figure 4 for Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Viaarxiv icon

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Add code
Oct 24, 2024
Figure 1 for DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Figure 2 for DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Figure 3 for DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Figure 4 for DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Viaarxiv icon

Gradient Rewiring for Editable Graph Neural Network Training

Add code
Oct 21, 2024
Figure 1 for Gradient Rewiring for Editable Graph Neural Network Training
Figure 2 for Gradient Rewiring for Editable Graph Neural Network Training
Figure 3 for Gradient Rewiring for Editable Graph Neural Network Training
Figure 4 for Gradient Rewiring for Editable Graph Neural Network Training
Viaarxiv icon

BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

Add code
Oct 01, 2024
Figure 1 for BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
Figure 2 for BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
Figure 3 for BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
Figure 4 for BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
Viaarxiv icon

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Add code
May 28, 2024
Figure 1 for Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
Figure 2 for Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
Figure 3 for Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
Figure 4 for Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
Viaarxiv icon

ViTAR: Vision Transformer with Any Resolution

Add code
Mar 28, 2024
Viaarxiv icon

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Add code
Mar 03, 2024
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Jan 18, 2024
Figure 1 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 2 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 3 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 4 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Viaarxiv icon