Deep Forest is a prominent machine learning algorithm known for its high accuracy in forecasting. Compared with deep neural networks, Deep Forest has almost no multiplication operations and has better performance on small datasets. However, due to the deep structure and large forest quantity, it suffers from large amounts of calculation and memory consumption. In this paper, an efficient hardware accelerator is proposed for deep forest models, which is also the first work to implement Deep Forest on FPGA. Firstly, a delicate node computing unit (NCU) is designed to improve inference speed. Secondly, based on NCU, an efficient architecture and an adaptive dataflow are proposed, in order to alleviate the problem of node computing imbalance in the classification process. Moreover, an optimized storage scheme in this design also improves hardware utilization and power efficiency. The proposed design is implemented on an FPGA board, Intel Stratix V, and it is evaluated by two typical datasets, ADULT and Face Mask Detection. The experimental results show that the proposed design can achieve around 40x speedup compared to that on a 40 cores high performance x86 CPU.