In recent times, an increasing number of researchers have been devoted to utilizing deep neural networks for end-to-end flight navigation. This approach has gained traction due to its ability to bridge the gap between perception and planning that exists in traditional methods, thereby eliminating delays between modules. However, the practice of replacing original modules with neural networks in a black-box manner diminishes the overall system's robustness and stability. It lacks principled explanations and often fails to consistently generate high-quality motion trajectories. Furthermore, such methods often struggle to rigorously account for the robot's kinematic constraints, resulting in the generation of trajectories that cannot be executed satisfactorily. In this work, we combine the advantages of traditional methods and neural networks by proposing an optimization-embedded neural network. This network can learn high-quality trajectories directly from visual inputs without the need of mapping, while ensuring dynamic feasibility. Here, the deep neural network is employed to directly extract environment safety regions from depth images. Subsequently, we employ a model-based approach to represent these regions as safety constraints in trajectory optimization. Leveraging the availability of highly efficient optimization algorithms, our method robustly converges to feasible and optimal solutions that satisfy various user-defined constraints. Moreover, we differentiate the optimization process, allowing it to be trained as a layer within the neural network. This approach facilitates the direct interaction between perception and planning, enabling the network to focus more on the spatial regions where optimal solutions exist. As a result, it further enhances the quality and stability of the generated trajectories. View paper on