Most conventional Neural Architecture Search (NAS) approaches are limited in that they only generate architectures (network topologies) without searching for optimal parameters. While some NAS methods handle this issue by utilizing a supernet trained on a large-scale dataset such as ImageNet, they may be suboptimal if the target tasks are highly dissimilar from the dataset the supernet is trained on. To tackle this issue, we propose a novel neural network retrieval method, which retrieves the most optimal pre-trained network for a given task and constraints (e.g. number of parameters) from a model zoo. We train this framework by meta-learning a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a network that obtains high performance on it, and minimize the similarity between an irrelevant dataset-network pair. We validate the efficacy of our method on ten real-world datasets, against existing NAS baselines. The results show that our method instantly retrieves networks that outperforms models obtained with the baselines with significantly fewer training steps to reach the target performance.