As one type of complex networks widely-seen in real-world application, heterogeneous information networks (HINs) often encapsulate higher-order interactions that crucially reflect the complex nature among nodes and edges in real-world data. Modeling higher-order interactions in HIN facilitates the user-guided clustering problem by providing an informative collection of signals. At the same time, network motifs have been used extensively to reveal higher-order interactions and network semantics in homogeneous networks. Thus, it is natural to extend the use of motifs to HIN, and we tackle the problem of user-guided clustering in HIN by using motifs. We highlight the benefits of comprehensively modeling higher-order interactions instead of decomposing the complex relationships to pairwise interaction. We propose the MoCHIN model which is applicable to arbitrary forms of HIN motifs, which is often necessary for the application scenario in HINs due to their rich and diverse semantics encapsulated in the heterogeneity. To overcome the curse of dimensionality since the tensor size grows exponentially as the number of nodes increases in our model, we propose an efficient inference algorithm for MoCHIN. In our experiment, MoCHIN surpasses all baselines in three evaluation tasks under different metrics. The advantage of our model when the supervision is weak is also discussed in additional experiments.