Numerical weather prediction (NWP) models require ever-growing computing time/resources, but still, have difficulties with predicting weather extremes. Here we introduce a data-driven framework that is based on analog forecasting (prediction using past similar patterns) and employs a novel deep learning pattern-recognition technique (capsule neural networks, CapsNets) and impact-based auto-labeling strategy. CapsNets are trained on mid-tropospheric large-scale circulation patterns (Z500) labeled $0-4$ depending on the existence and geographical region of surface temperature extremes over North America several days ahead. The trained networks predict the occurrence/region of cold or heat waves, only using Z500, with accuracies (recalls) of $69\%-45\%$ $(77\%-48\%)$ or $62\%-41\%$ $(73\%-47\%)$ $1-5$ days ahead. CapsNets outperform simpler techniques such as convolutional neural networks and logistic regression. Using both temperature and Z500, accuracies (recalls) with CapsNets increase to $\sim 80\%$ $(88\%)$, showing the promises of multi-modal data-driven frameworks for accurate/fast extreme weather predictions, which can augment NWP efforts in providing early warnings.