Abstract:In this study, we propose an approach to equip domestic robots with the ability to perform simple household tidying tasks. We focus specifically on 'knolling,' an activity related to organizing scattered items into neat and space-efficient arrangements. Unlike the uniformity of industrial environments, household settings present unique challenges due to their diverse array of items and the subjectivity of tidiness. Here, we draw inspiration from natural language processing (NLP) and utilize a transformer-based approach that predicts the next position of an item in a sequence of neatly positioned items. We integrate the knolling model with a visual perception model and a physical robot arm to demonstrate a machine that declutters and organizes a dozen freeform items of various shapes and sizes.