Abstract:The accessibility surge and abuse risks of user-friendly image editing models have created an urgent need for generalizable, up-to-date methods for Image Manipulation Detection and Localization (IMDL). Current IMDL research typically uses cross-dataset evaluation, where models trained on one benchmark are tested on others. However, this simplified evaluation approach conceals the fragility of existing methods when handling diverse AI-generated content, leading to misleading impressions of progress. This paper challenges this illusion by proposing NeXT-IMDL, a large-scale diagnostic benchmark designed not just to collect data, but to probe the generalization boundaries of current detectors systematically. Specifically, NeXT-IMDL categorizes AIGC-based manipulations along four fundamental axes: editing models, manipulation types, content semantics, and forgery granularity. Built upon this, NeXT-IMDL implements five rigorous cross-dimension evaluation protocols. Our extensive experiments on 11 representative models reveal a critical insight: while these models perform well in their original settings, they exhibit systemic failures and significant performance degradation when evaluated under our designed protocols that simulate real-world, various generalization scenarios. By providing this diagnostic toolkit and the new findings, we aim to advance the development towards building truly robust, next-generation IMDL models.
Abstract:Hand-object interaction(HOI) is the fundamental link between human and environment, yet its dexterous and complex pose significantly challenges for gesture control. Despite significant advances in AI and robotics, enabling machines to understand and simulate hand-object interactions, capturing the semantics of functional grasping tasks remains a considerable challenge. While previous work can generate stable and correct 3D grasps, they are still far from achieving functional grasps due to unconsidered grasp semantics. To address this challenge, we propose an innovative two-stage framework, Functional Grasp Synthesis Net (FGS-Net), for generating 3D HOI driven by functional text. This framework consists of a text-guided 3D model generator, Functional Grasp Generator (FGG), and a pose optimization strategy, Functional Grasp Refiner (FGR). FGG generates 3D models of hands and objects based on text input, while FGR fine-tunes the poses using Object Pose Approximator and energy functions to ensure the relative position between the hand and object aligns with human intent and remains physically plausible. Extensive experiments demonstrate that our approach achieves precise and high-quality HOI generation without requiring additional 3D annotation data.




Abstract:Integrating logical reasoning and machine learning by approximating logical inference with differentiable operators is a widely used technique in Neuro-Symbolic systems. However, some differentiable operators could bring a significant bias during backpropagation and degrade the performance of Neuro-Symbolic learning. In this paper, we reveal that this bias, named \textit{Implication Bias} is common in loss functions derived from fuzzy logic operators. Furthermore, we propose a simple yet effective method to transform the biased loss functions into \textit{Reduced Implication-bias Logic Loss (RILL)} to address the above problem. Empirical study shows that RILL can achieve significant improvements compared with the biased logic loss functions, especially when the knowledge base is incomplete, and keeps more robust than the compared methods when labelled data is insufficient.




Abstract:Dynamic movement primitives (DMPs) are a flexible trajectory learning scheme widely used in motion generation of robotic systems. However, existing DMP-based methods mainly focus on simple go-to-goal tasks. Motivated to handle tasks beyond point-to-point motion planning, this work presents temporal logic guided optimization of motion primitives, namely PIBB-TL algorithm, for complex manipulation tasks with user preferences. In particular, weighted truncated linear temporal logic (wTLTL) is incorporated in the PIBB-TL algorithm, which not only enables the encoding of complex tasks that involve a sequence of logically organized action plans with user preferences, but also provides a convenient and efficient means to design the cost function. The black-box optimization is then adapted to identify optimal shape parameters of DMPs to enable motion planning of robotic systems. The effectiveness of the PIBB-TL algorithm is demonstrated via simulation and experime