Abstract:Work on summarization has explored both reinforcement learning (RL) optimization using ROUGE as a reward and syntax-aware models, such as models those input is enriched with part-of-speech (POS)-tags and dependency information. However, it is not clear what is the respective impact of these approaches beyond the standard ROUGE evaluation metric. Especially, RL-based for summarization is becoming more and more popular. In this paper, we provide a detailed comparison of these two approaches and of their combination along several dimensions that relate to the perceived quality of the generated summaries: number of repeated words, distribution of part-of-speech tags, impact of sentence length, relevance and grammaticality. Using the standard Gigaword sentence summarization task, we compare an RL self-critical sequence training (SCST) method with syntax-aware models that leverage POS tags and Dependency information. We show that on all qualitative evaluations, the combined model gives the best results, but also that only training with RL and without any syntactic information already gives nearly as good results as syntax-aware models with less parameters and faster training convergence.
Abstract:We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9\%) and Yelp Full (64.9\%).