The application of closed-loop approaches in systems neuroscience and therapeutic stimulation holds great promise for revolutionizing our understanding of the brain and for developing novel neuromodulation therapies to restore lost functions. Neural prostheses capable of multi-channel neural recording, on-site signal processing, rapid symptom detection, and closed-loop stimulation are critical to enabling such novel treatments. However, the existing closed-loop neuromodulation devices are too simplistic and lack sufficient on-chip processing and intelligence. In this paper, we first discuss both commercial and investigational closed-loop neuromodulation devices for brain disorders. Next, we review state-of-the-art neural prostheses with on-chip machine learning, focusing on application-specific integrated circuits (ASIC). System requirements, performance and hardware comparisons, design trade-offs, and hardware optimization techniques are discussed. To facilitate a fair comparison and guide design choices among various on-chip classifiers, we propose a new energy-area (E-A) efficiency figure of merit that evaluates hardware efficiency and multi-channel scalability. Finally, we present several techniques to improve the key design metrics of tree-based on-chip classifiers, both in the context of ensemble methods and oblique structures.