Recent Multiple Object Tracking (MOT) methods have gradually attempted to integrate object detection and instance re-identification (Re-ID) into a united network to form a one-stage solution. Typically, these methods use two separated branches within a single network to accomplish detection and Re-ID respectively without studying the inter-relationship between them, which inevitably impedes the tracking performance. In this paper, we propose an online multi-object tracking framework based on a hierarchical single-branch network to solve this problem. Specifically, the proposed single-branch network utilizes an improved Hierarchical Online In-stance Matching (iHOIM) loss to explicitly model the inter-relationship between object detection and Re-ID. Our novel iHOIM loss function unifies the objectives of the two sub-tasks and encourages better detection performance and feature learning even in extremely crowded scenes. Moreover, we propose to introduce the object positions, predicted by a motion model, as region proposals for subsequent object detection, where the intuition is that detection results and motion predictions can complement each other in different scenarios. Experimental results on MOT16 and MOT20 datasets show that we can achieve state-of-the-art tracking performance, and the ablation study verifies the effectiveness of each proposed component.