Human Parsing


Human parsing is the process of identifying, segmenting, and categorizing different parts of a human body in an image or video such as head, shoulders, knees, and toes.

What's in the Box? Reasoning about Unseen Objects from Multimodal Cues

Add code
Jun 17, 2025
Viaarxiv icon

Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables

Add code
Jun 13, 2025
Viaarxiv icon

GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition

Add code
Jun 09, 2025
Viaarxiv icon

PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments

Add code
Jun 07, 2025
Viaarxiv icon

InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing

Add code
May 30, 2025
Viaarxiv icon

Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions

Add code
May 28, 2025
Viaarxiv icon

BodyGPS: Anatomical Positioning System

Add code
May 12, 2025
Viaarxiv icon

Human in the Loop Adaptive Optimization for Improved Time Series Forecasting

Add code
May 21, 2025
Viaarxiv icon

CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models

Add code
May 22, 2025
Viaarxiv icon

A Dataset for Spatiotemporal-Sensitive POI Question Answering

Add code
May 16, 2025
Viaarxiv icon