Accurate ultrasound segmentation is pursued because it aids clinicians in achieving a comprehensive diagnosis. Due to the presence of low image quality and high costs associated with annotation, two primary concerns arise: (1) enhancing the understanding of multi-scale features, and (2) improving the resistance to data dependency. To mitigate these concerns, we propose HCS-TNAS, a novel neural architecture search (NAS) method that automatically designs the network. For the first concern, we employ multi-level searching encompassing cellular, layer, and module levels. Specifically, we design an Efficient NAS-ViT module that searches for multi-scale tokens in the vision Transformer (ViT) to capture context and local information, rather than relying solely on simple combinations of operations. For the second concern, we propose a hybrid constraint-driven semi-supervised learning method that considers additional network independence and incorporates contrastive loss in a NAS formulation. By further developing a stage-wise optimization strategy, a rational network structure can be identified. Extensive experiments on three publicly available ultrasound image datasets demonstrate that HCS-TNAS effectively improves segmentation accuracy and outperforms state-of-the-art methods.