Although dense local spatial-temporal features with bag-of-features representation achieve state-of-the-art performance for action recognition, the huge feature number and feature size prevent current methods from scaling up to real size problems. In this work, we investigate different types of feature sampling strategies for action recognition, namely dense sampling, uniformly random sampling and selective sampling. We propose two effective selective sampling methods using object proposal techniques. Experiments conducted on a large video dataset show that we are able to achieve better average recognition accuracy using 25% less features, through one of proposed selective sampling methods, and even remain comparable accuracy while discarding 70% features.