The use of electric vehicles (EV) in the last mile is appealing from both sustainability and operational cost perspectives. In addition to the inherent cost efficiency of EVs, selling energy back to the grid during peak grid demand, is a potential source of additional revenue to a fleet operator. To achieve this, EVs have to be at specific locations (discharge points) during specific points in time (peak period), even while meeting their core purpose of delivering goods to customers. In this work, we consider the problem of EV routing with constraints on loading capacity; time window; vehicle-to-grid energy supply (CEVRPTW-D); which not only satisfy multiple system objectives, but also scale efficiently to large problem sizes involving hundreds of customers and discharge stations. We present QuikRouteFinder that uses reinforcement learning (RL) for EV routing to overcome these challenges. Using Solomon datasets, results from RL are compared against exact formulations based on mixed-integer linear program (MILP) and genetic algorithm (GA) metaheuristics. On an average, the results show that RL is 24 times faster than MILP and GA, while being close in quality (within 20%) to the optimal.