Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aman Saini

Spivavtor: An Instruction Tuned Ukrainian Text Editing Model

Apr 29, 2024

Aman Saini, Artem Chernodub, Vipul Raheja, Vivek Kulkarni

Abstract:We introduce Spivavtor, a dataset, and instruction-tuned models for text editing focused on the Ukrainian language. Spivavtor is the Ukrainian-focused adaptation of the English-only CoEdIT model. Similar to CoEdIT, Spivavtor performs text editing tasks by following instructions in Ukrainian. This paper describes the details of the Spivavtor-Instruct dataset and Spivavtor models. We evaluate Spivavtor on a variety of text editing tasks in Ukrainian, such as Grammatical Error Correction (GEC), Text Simplification, Coherence, and Paraphrasing, and demonstrate its superior performance on all of them. We publicly release our best-performing models and data as resources to the community to advance further research in this space.

* Accepted to UNLP Workshop 2024

Via

Access Paper or Ask Questions

TweetNERD -- End to End Entity Linking Benchmark for Tweets

Oct 14, 2022

Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, Ali Mollahosseini

Figure 1 for TweetNERD -- End to End Entity Linking Benchmark for Tweets

Figure 2 for TweetNERD -- End to End Entity Linking Benchmark for Tweets

Figure 3 for TweetNERD -- End to End Entity Linking Benchmark for Tweets

Figure 4 for TweetNERD -- End to End Entity Linking Benchmark for Tweets

Abstract:Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area. We describe evaluation setup with TweetNERD for three NERD tasks: Named Entity Recognition (NER), Entity Linking with True Spans (EL), and End to End Entity Linking (End2End); and provide performance of existing publicly available methods on specific TweetNERD splits. TweetNERD is available at: https://doi.org/10.5281/zenodo.6617192 under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Check out more details at https://github.com/twitter-research/TweetNERD.

* 19 pages, 2 figures. Accepted to Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track 2022. Data available at: https://doi.org/10.5281/zenodo.6617192 under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Check out more details at https://github.com/twitter-research/TweetNERD

Via

Access Paper or Ask Questions