Abstract:Reinforcement learning has become an essential algorithm for generating complex robotic behaviors. However, to learn such behaviors, it is necessary to design a reward function that describes the task, which often consists of multiple objectives that needs to be balanced. This tuning process is known as reward engineering and typically involves extensive trial-and-error. In this paper, to avoid this trial-and-error process, we propose the concept of Constraints as Rewards (CaR). CaR formulates the task objective using multiple constraint functions instead of a reward function and solves a reinforcement learning problem with constraints using the Lagrangian-method. By adopting this approach, different objectives are automatically balanced, because Lagrange multipliers serves as the weights among the objectives. In addition, we will demonstrate that constraints, expressed as inequalities, provide an intuitive interpretation of the optimization target designed for the task. We apply the proposed method to the standing-up motion generation task of a six-wheeled-telescopic-legged robot and demonstrate that the proposed method successfully acquires the target behavior, even though it is challenging to learn with manually designed reward functions.
Abstract:Robot manipulation tasks by natural language instructions need common understanding of the target object between human and the robot. However, the instructions often have an interpretation ambiguity, because the instruction lacks important information, or does not express the target object correctly to complete the task. To solve this ambiguity problem, we hypothesize that "naming" the target objects in advance will reduce the ambiguity of natural language instructions. We propose a robot system and method that incorporates naming with appearance of the objects in advance, so that in the later manipulation task, instruction can be performed with its unique name to disambiguate the objects easily. To demonstrate the effectiveness of our approach, we build a system that can memorize the target objects, and show that naming the objects facilitates detection of the target objects and improves the success rate of manipulation instructions. With this method, the success rate of object manipulation task increases by 31% in ambiguous instructions.