Influence maximization (IM) is a combinatorial problem of identifying a subset of nodes called the seed nodes in a network (graph), which when activated, provide a maximal spread of influence in the network for a given diffusion model and a budget for seed set size. IM has numerous applications such as viral marketing, epidemic control, sensor placement and other network-related tasks. However, the uses are limited due to the computational complexity of current algorithms. Recently, learning heuristics for IM have been explored to ease the computational burden. However, there are serious limitations in current approaches such as: (1) IM formulations only consider influence via spread and ignore self activation; (2) scalability to large graphs; (3) generalizability across graph families; (4) low computational efficiency with a large running time to identify seed sets for every test network. In this work, we address each of these limitations through a unique approach that involves (1) formulating a generic IM problem as a Markov decision process that handles both intrinsic and influence activations; (2) employing double Q learning to estimate seed nodes; (3) ensuring scalability via sub-graph based representations; and (4) incorporating generalizability via meta-learning across graph families. Extensive experiments are carried out in various standard networks to validate performance of the proposed Graph Meta Reinforcement learning (GraMeR) framework. The results indicate that GraMeR is multiple orders faster and generic than conventional approaches.