Current approaches for image-based keypoint detection on animal (including human) body and face are limited to specific keypoints and species. We address the limitation by proposing the Open-Vocabulary Keypoint Detection (OVKD) task. It aims to use text prompts to localize arbitrary keypoints of any species. To accomplish this objective, we propose Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM), which utilizes both vision and language models to harness the relationship between text and vision and thus achieve keypoint detection through associating text prompt with relevant keypoint features. Additionally, KDSM integrates domain distribution matrix matching and some special designs to reinforce the relationship between language and vision, thereby improving the model's generalizability and performance. Extensive experiments show that our proposed components bring significant performance improvements, and our overall method achieves impressive results in OVKD. Remarkably, our method outperforms the state-of-the-art few-shot keypoint detection methods using a zero-shot fashion. We will make the source code publicly accessible.