Mapping spoken text to gestures is an important research area for robots with conversation capability. However, mapping a gesture to every spoken text a priori is impossible, especially when a response is automatically generated by a conversation agent. Knowledge of human gesture characteristics can be used to map text to some semantic space where texts with similar meanings are clustered together; then, a mapped gesture is defined for each semantic cluster (i.e., concept). Here, we discuss the practical issues of obtaining concepts for the conversation agent Rinna, which has a personalized vocabulary such as short terms. We compared the concepts obtained automatically with a natural language processing approach and manually with a sociological approach, and we identified three limitations of the former: at the semantic level with emoji and symbols; at the semantic level with slang, new words, and buzzwords; and at the pragmatic level. We consider these problems to be due to the personalized vocabulary of Rinna. To solve these issues, we propose combining manual and autogenerated approaches to map texts to a semantic space. A follow-up experiment showed that a robot gesture selected based on concepts left a better impression than a randomly selected gesture, which suggests the feasibility of applying semantic space to text-to-gesture mapping. The present work contributes insights into developing a methodology for generating gestures of a conversation agent with a personalized vocabulary.