Abstract:This Paper proposes a novel Transformer-based end-to-end autonomous driving model named Detrive. This model solves the problem that the past end-to-end models cannot detect the position and size of traffic participants. Detrive uses an end-to-end transformer based detection model as its perception module; a multi-layer perceptron as its feature fusion network; a recurrent neural network with gate recurrent unit for path planning; and two controllers for the vehicle's forward speed and turning angle. The model is trained with an on-line imitation learning method. In order to obtain a better training set, a reinforcement learning agent that can directly obtain a ground truth bird's-eye view map from the Carla simulator as a perceptual output, is used as teacher for the imitation learning. The trained model is tested on the Carla's autonomous driving benchmark. The results show that the Transformer detector based end-to-end model has obvious advantages in dynamic obstacle avoidance compared with the traditional classifier based end-to-end model.