Human Parsing


Human parsing is the process of identifying, segmenting, and categorizing different parts of a human body in an image or video such as head, shoulders, knees, and toes.

SLOW: Strategic Logical-inference Open Workspace for Cognitive Adaptation in AI Tutoring

Add code
Mar 30, 2026
Viaarxiv icon

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Add code
Mar 30, 2026
Viaarxiv icon

LaDy: Lagrangian-Dynamic Informed Network for Skeleton-based Action Segmentation via Spatial-Temporal Modulation

Add code
Mar 25, 2026
Viaarxiv icon

PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

Add code
Mar 25, 2026
Viaarxiv icon

MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation

Add code
Mar 25, 2026
Viaarxiv icon

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Add code
Mar 25, 2026
Viaarxiv icon

Learning Trajectory-Aware Multimodal Large Language Models for Video Reasoning Segmentation

Add code
Mar 23, 2026
Viaarxiv icon

A Mathematical Theory of Understanding

Add code
Mar 19, 2026
Viaarxiv icon

ALARA for Agents: Least-Privilege Context Engineering Through Portable Composable Multi-Agent Teams

Add code
Mar 20, 2026
Viaarxiv icon

UniGround: Universal 3D Visual Grounding via Training-Free Scene Parsing

Add code
Mar 09, 2026
Viaarxiv icon