The lack of large-scale real datasets with annotationsmakes transfer learning a necessity for video activity under-standing. Within this scope, we aim at developing an effec-tive method for low-shot transfer learning for first-personaction classification. We leverage independently trained lo-cal visual cues to learn representations that can be trans-ferred from a source domain providing primitive action la-bels to a target domain with only a handful of examples.Such visual cues include object-object interactions, handgrasps and motion within regions that are a function of handlocations. We suggest a framework based on meta-learningto appropriately extract the distinctive and domain invari-ant components of the deployed visual cues, so to be able totransfer action classification models across public datasetscaptured with different scene configurations. We thoroughlyevaluate our methodology and report promising results overstate-of-the-art action classification approaches for bothinter-class and inter-dataset transfer.