Picture for Jianhua Han

Jianhua Han

Thinking with Geometry: Active Geometry Integration for Spatial Reasoning

Add code
Feb 05, 2026
Viaarxiv icon

SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM

Add code
Feb 03, 2026
Viaarxiv icon

Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI

Add code
Oct 06, 2025
Figure 1 for Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Figure 2 for Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Figure 3 for Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Figure 4 for Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Viaarxiv icon

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Add code
Jul 22, 2025
Figure 1 for C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning
Figure 2 for C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning
Figure 3 for C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning
Figure 4 for C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning
Viaarxiv icon

Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs

Add code
Jun 06, 2025
Figure 1 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Figure 2 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Figure 3 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Figure 4 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Viaarxiv icon

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

Add code
May 25, 2025
Viaarxiv icon

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Add code
Apr 03, 2025
Figure 1 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 2 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 3 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 4 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Viaarxiv icon

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

Add code
Mar 09, 2025
Figure 1 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Figure 2 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Figure 3 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Figure 4 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Viaarxiv icon

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

Add code
Mar 08, 2025
Viaarxiv icon

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

Add code
Feb 21, 2025
Figure 1 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 2 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 3 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 4 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Viaarxiv icon