Picture for Hongjie Zhang

Hongjie Zhang

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting

Add code
Apr 02, 2025
Viaarxiv icon

Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning

Add code
Aug 27, 2024
Viaarxiv icon

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Add code
Mar 24, 2024
Viaarxiv icon

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Add code
Mar 22, 2024
Viaarxiv icon

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding

Add code
Dec 08, 2023
Viaarxiv icon

Multi-view Feature Extraction based on Triple Contrastive Heads

Add code
Mar 22, 2023
Viaarxiv icon

Multi-view Feature Extraction based on Dual Contrastive Head

Add code
Feb 08, 2023
Viaarxiv icon

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

Add code
Dec 07, 2022
Viaarxiv icon

AcceRL: Policy Acceleration Framework for Deep Reinforcement Learning

Add code
Nov 28, 2022
Viaarxiv icon