Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daiguo Zhou

SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization

Nov 17, 2025

Xuankun Rong, Wenke Huang, Tingfeng Wang, Daiguo Zhou, Bo Du, Mang Ye

Figure 1 for SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization

Figure 2 for SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization

Figure 3 for SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization

Figure 4 for SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization

Abstract:Multimodal large language models (MLLMs) have demonstrated impressive reasoning and instruction-following capabilities, yet their expanded modality space introduces new compositional safety risks that emerge from complex text-image interactions. Such cross-modal couplings can produce unsafe semantics even when individual inputs are benign, exposing the fragile safety awareness of current MLLMs. While recent works enhance safety by guiding models to reason about potential risks, unregulated reasoning traces may compromise alignment; although Group Relative Policy Optimization (GRPO) offers self-rewarded refinement without human supervision, it lacks verifiable signals for reasoning safety. To address this, we propose SafeGRPO a self-rewarded multimodal safety alignment framework that integrates rule-governed reward construction into GRPO, enabling interpretable and verifiable optimization of reasoning safety. Built upon the constructed SafeTag-VL-3K dataset with explicit visual, textual, and combined safety tags, SafeGRPO performs step-guided safety thinking to enforce structured reasoning and behavior alignment, substantially improving multimodal safety awareness, compositional robustness, and reasoning stability across diverse benchmarks without sacrificing general capabilities.

Via

Access Paper or Ask Questions

Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence

Aug 01, 2025

Danzhen Fu, Jiagao Hu, Daiguo Zhou, Fei Wang, Zepeng Wang, Wenhua Liao

Figure 1 for Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence

Figure 2 for Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence

Figure 3 for Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence

Figure 4 for Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence

Abstract:Pedestrian detection models in autonomous driving systems often lack robustness due to insufficient representation of dangerous pedestrian scenarios in training datasets. To address this limitation, we present a novel framework for controllable pedestrian video editing in multi-view driving scenarios by integrating video inpainting and human motion control techniques. Our approach begins by identifying pedestrian regions of interest across multiple camera views, expanding detection bounding boxes with a fixed ratio, and resizing and stitching these regions into a unified canvas while preserving cross-view spatial relationships. A binary mask is then applied to designate the editable area, within which pedestrian editing is guided by pose sequence control conditions. This enables flexible editing functionalities, including pedestrian insertion, replacement, and removal. Extensive experiments demonstrate that our framework achieves high-quality pedestrian editing with strong visual realism, spatiotemporal coherence, and cross-view consistency. These results establish the proposed method as a robust and versatile solution for multi-view pedestrian video generation, with broad potential for applications in data augmentation and scenario simulation in autonomous driving.

* ICCV 2025 Workshop (HiGen)

Via

Access Paper or Ask Questions

Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation

May 15, 2023

Xin Xiao, Daiguo Zhou, Jiagao Hu, Yi Hu, Yongchao Xu

Figure 1 for Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation

Figure 2 for Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation

Figure 3 for Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation

Figure 4 for Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation

Abstract:Semantic segmentation has recently witnessed great progress. Despite the impressive overall results, the segmentation performance in some hard areas (e.g., small objects or thin parts) is still not promising. A straightforward solution is hard sample mining, which is widely used in object detection. Yet, most existing hard pixel mining strategies for semantic segmentation often rely on pixel's loss value, which tends to decrease during training. Intuitively, the pixel hardness for segmentation mainly depends on image structure and is expected to be stable. In this paper, we propose to learn pixel hardness for semantic segmentation, leveraging hardness information contained in global and historical loss values. More precisely, we add a gradient-independent branch for learning a hardness level (HL) map by maximizing hardness-weighted segmentation loss, which is minimized for the segmentation head. This encourages large hardness values in difficult areas, leading to appropriate and stable HL map. Despite its simplicity, the proposed method can be applied to most segmentation methods with no and marginal extra cost during inference and training, respectively. Without bells and whistles, the proposed method achieves consistent/significant improvement (1.37% mIoU on average) over most popular semantic segmentation methods on Cityscapes dataset, and demonstrates good generalization ability across domains. The source codes are available at https://github.com/Menoly-xin/Hardness-Level-Learning .

Via

Access Paper or Ask Questions