Abstract:In this work, we introduce a lightweight discourse connective detection system. Employing gradient boosting trained on straightforward, low-complexity features, this proposed approach sidesteps the computational demands of the current approaches that rely on deep neural networks. Considering its simplicity, our approach achieves competitive results while offering significant gains in terms of time even on CPU. Furthermore, the stable performance across two unrelated languages suggests the robustness of our system in the multilingual scenario. The model is designed to support the annotation of discourse relations, particularly in scenarios with limited resources, while minimizing performance loss.
Abstract:We describe Turkish Discourse Bank 1.2, the latest version of a discourse corpus annotated for explicitly or implicitly conveyed discourse relations, their constitutive units, and senses in the Penn Discourse Treebank style. We present an evaluation of the recently added tokens and examine three commonly occurring dependency patterns that hold among the constitutive units of a pair of adjacent discourse relations, namely, shared arguments, full embedding and partial containment of a discourse relation. We present three major findings: (a) implicitly conveyed relations occur more often than explicitly conveyed relations in the data; (b) it is much more common for two adjacent implicit discourse relations to share an argument than for two adjacent explicit relations to do so; (c) both full embedding and partial containment of discourse relations are pervasive in the corpus, which can be partly due to subordinator connectives whose preposed subordinate clause tends to be selected together with the matrix clause rather than being selected alone. Finally, we briefly discuss the implications of our findings for Turkish discourse parsing.