Abstract:This paper introduces Motion Personalization, a new task that generates personalized motions aligned with text descriptions using several basic motions containing Persona. To support this novel task, we introduce a new large-scale motion dataset called PerMo (PersonaMotion), which captures the unique personas of multiple actors. We also propose a multi-modal finetuning method of a pretrained motion diffusion model called PersonaBooth. PersonaBooth addresses two main challenges: i) A significant distribution gap between the persona-focused PerMo dataset and the pretraining datasets, which lack persona-specific data, and ii) the difficulty of capturing a consistent persona from the motions vary in content (action type). To tackle the dataset distribution gap, we introduce a persona token to accept new persona features and perform multi-modal adaptation for both text and visuals during finetuning. To capture a consistent persona, we incorporate a contrastive learning technique to enhance intra-cohesion among samples with the same persona. Furthermore, we introduce a context-aware fusion mechanism to maximize the integration of persona cues from multiple input motions. PersonaBooth outperforms state-of-the-art motion style transfer methods, establishing a new benchmark for motion personalization.
Abstract:An imprecise region is referred to as a geographical area without a clearly-defined boundary in the literature. Previous clustering-based approaches exploit spatial information to find such regions. However, the prior studies suffer from the following two problems: the subjectivity in selecting clustering parameters and the inclusion of a large portion of the undesirable region (i.e., a large number of noise points). To overcome these problems, we present DIR-ST$^2$, a novel framework for delineating an imprecise region by iteratively performing density-based clustering, namely DBSCAN, along with not only spatio--textual information but also temporal information on social media. Specifically, we aim at finding a proper radius of a circle used in the iterative DBSCAN process by gradually reducing the radius for each iteration in which the temporal information acquired from all resulting clusters are leveraged. Then, we propose an efficient and automated algorithm delineating the imprecise region via hierarchical clustering. Experiment results show that by virtue of the significant noise reduction in the region, our DIR-ST$^2$ method outperforms the state-of-the-art approach employing one-class support vector machine in terms of the $\mathcal{F}_1$ score from comparison with precisely-defined regions regarded as a ground truth, and returns apparently better delineation of imprecise regions. The computational complexity of DIR-ST$^2$ is also analytically and numerically shown.