Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Omid Kashefi

ArgRewrite V.2: an Annotated Argumentative Revisions Corpus

Jun 03, 2022

Omid Kashefi, Tazin Afrin, Meghan Dale, Christopher Olshefski, Amanda Godley, Diane Litman, Rebecca Hwa

Abstract:Analyzing how humans revise their writings is an interesting research question, not only from an educational perspective but also in terms of artificial intelligence. Better understanding of this process could facilitate many NLP applications, from intelligent tutoring systems to supportive and collaborative writing environments. Developing these applications, however, requires revision corpora, which are not widely available. In this work, we present ArgRewrite V.2, a corpus of annotated argumentative revisions, collected from two cycles of revisions to argumentative essays about self-driving cars. Annotations are provided at different levels of purpose granularity (coarse and fine) and scope (sentential and subsentential). In addition, the corpus includes the revision goal given to each writer, essay scores, annotation verification, pre- and post-study surveys collected from participants as meta-data. The variety of revision unit scope and purpose granularity levels in ArgRewrite, along with the inclusion of new types of meta-data, can make it a useful resource for research and applications that involve revision analysis. We demonstrate some potential applications of ArgRewrite V.2 in the development of automatic revision purpose predictors, as a training source and benchmark.

* Lang Resources & Evaluation (2022)

Via

Access Paper or Ask Questions

Unsupervised Part-of-Speech Induction

Jan 10, 2018

Omid Kashefi

Figure 1 for Unsupervised Part-of-Speech Induction

Figure 2 for Unsupervised Part-of-Speech Induction

Figure 3 for Unsupervised Part-of-Speech Induction

Abstract:Part-of-Speech (POS) tagging is an old and fundamental task in natural language processing. While supervised POS taggers have shown promising accuracy, it is not always feasible to use supervised methods due to lack of labeled data. In this project, we attempt to unsurprisingly induce POS tags by iteratively looking for a recurring pattern of words through a hierarchical agglomerative clustering process. Our approach shows promising results when compared to the tagging results of the state-of-the-art unsupervised POS taggers.

Via

Access Paper or Ask Questions

MIZAN: A Large Persian-English Parallel Corpus

Jan 10, 2018

Omid Kashefi

Figure 1 for MIZAN: A Large Persian-English Parallel Corpus

Figure 2 for MIZAN: A Large Persian-English Parallel Corpus

Figure 3 for MIZAN: A Large Persian-English Parallel Corpus

Figure 4 for MIZAN: A Large Persian-English Parallel Corpus

Abstract:One of the most major and essential tasks in natural language processing is machine translation that is now highly dependent upon multilingual parallel corpora. Through this paper, we introduce the biggest Persian-English parallel corpus with more than one million sentence pairs collected from masterpieces of literature. We also present acquisition process and statistics of the corpus, and experiment a base-line statistical machine translation system using the corpus.

Via

Access Paper or Ask Questions