Sentiments in opinionated text are often determined by both aspects and target words (or targets). We observe that targets and aspects interrelate in subtle ways, often yielding conflicting sentiments. Thus, a naive aggregation of sentiments from aspects and targets treated separately, as in existing sentiment analysis models, impairs performance. We propose SNAT, an approach that jointly considers aspects and targets when inferring sentiments. To capture and quantify relationships between targets and context words, SNAT uses a selective self-attention mechanism that handles implicit or missing targets. Specifically, SNAT involves two layers of attention mechanisms, respectively, for selective attention between targets and context words and attention over words based on aspects. On benchmark datasets, SNAT outperforms leading models by a large margin, yielding (absolute) gains in accuracy of 1.8% to 5.2%.