Complex or co-existing diseases are commonly treated using drug combinations, which can lead to higher risk of adverse side effects. The detection of polypharmacy side effects is usually done in Phase IV clinical trials, but there are still plenty which remain undiscovered when the drugs are put on the market. Such accidents have been affecting an increasing proportion of the population (15% in the US now) and it is thus of high interest to be able to predict the potential side effects as early as possible. Systematic combinatorial screening of possible drug-drug interactions (DDI) is challenging and expensive. However, the recent significant increases in data availability from pharmaceutical research and development efforts offer a novel paradigm for recovering relevant insights for DDI prediction. Accordingly, several recent approaches focus on curating massive DDI datasets (with millions of examples) and training machine learning models on them. Here we propose a neural network architecture able to set state-of-the-art results on this task---using the type of the side-effect and the molecular structure of the drugs alone---by leveraging a co-attentional mechanism. In particular, we show the importance of integrating joint information from the drug pairs early on when learning each drug's representation.