Deep neural network models for IMU sensor-based human activity recognition (HAR) that are trained from controlled, well-curated datasets suffer from poor generalizability in practical deployments. However, data collected from naturalistic settings often contains significant label noise. In this work, we examine two in-the-wild HAR datasets and DivideMix, a state-of-the-art learning with noise labels (LNL) method to understand the extent and impacts of noisy labels in training data. Our empirical analysis reveals that the substantial domain gaps among diverse subjects cause LNL methods to violate a key underlying assumption, namely, neural networks tend to fit simpler (and thus clean) data in early training epochs. Motivated by the insights, we design VALERIAN, an invariant feature learning method for in-the-wild wearable sensor-based HAR. By training a multi-task model with separate task-specific layers for each subject, VALERIAN allows noisy labels to be dealt with individually while benefiting from shared feature representation across subjects. We evaluated VALERIAN on four datasets, two collected in a controlled environment and two in the wild.