This paper addresses the problem of nonlinear filtering, i.e., computing the conditional distribution of the state of a stochastic dynamical system given a history of noisy partial observations. The primary focus is on scenarios involving degenerate likelihoods or high-dimensional states, where traditional sequential importance resampling (SIR) particle filters face the weight degeneracy issue. Our proposed method builds on an optimal transport interpretation of nonlinear filtering, leading to a simulation-based and likelihood-free algorithm that estimates the Brenier optimal transport map from the current distribution of the state to the distribution at the next time step. Our formulation allows us to harness the approximation power of neural networks to model complex and multi-modal distributions and employ stochastic optimization algorithms to enhance scalability. Extensive numerical experiments are presented that compare our method to the SIR particle filter and the ensemble Kalman filter, demonstrating the superior performance of our method in terms of sample efficiency, high-dimensional scalability, and the ability to capture complex and multi-modal distributions.