Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sanjay Basu

OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Jun 20, 2024

Allen Roush, Yusuf Shabazz, Arvind Balaji, Peter Zhang, Stefano Mezza, Markus Zhang, Sanjay Basu, Sriram Vishwanath, Mehdi Fatemi, Ravid Schwartz-Ziv

Figure 1 for OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Figure 2 for OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Figure 3 for OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Figure 4 for OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Abstract:We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable resources for training and evaluation. Our extensive experiments demonstrate the efficacy of fine-tuning state-of-the-art large language models for argumentative abstractive summarization across various methods, models, and datasets. By providing this comprehensive resource, we aim to advance computational argumentation and support practical applications for debaters, educators, and researchers. OpenDebateEvidence is publicly available to support further research and innovation in computational argumentation. Access it here: https://huggingface.co/datasets/Yusuf5/OpenCaselist

* Accepted for Publication to ARGMIN 2024 at ACL2024

Via

Access Paper or Ask Questions

Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio

Jun 28, 2023

Allen Roush, Sanjay Basu, Akshay Moorthy, Dmitry Dubovoy

Abstract:Despite rapid advancement in the field of Constrained Natural Language Generation, little time has been spent on exploring the potential of language models which have had their vocabularies lexically, semantically, and/or phonetically constrained. We find that most language models generate compelling text even under significant constraints. We present a simple and universally applicable technique for modifying the output of a language model by compositionally applying filter functions to the language models vocabulary before a unit of text is generated. This approach is plug-and-play and requires no modification to the model. To showcase the value of this technique, we present an easy to use AI writing assistant called Constrained Text Generation Studio (CTGS). CTGS allows users to generate or choose from text with any combination of a wide variety of constraints, such as banning a particular letter, forcing the generated words to have a certain number of syllables, and/or forcing the words to be partial anagrams of another word. We introduce a novel dataset of prose that omits the letter e. We show that our method results in strictly superior performance compared to fine-tuning alone on this dataset. We also present a Huggingface space web-app presenting this technique called Gadsby. The code is available to the public here: https://github.com/Hellisotherpeople/Constrained-Text-Generation-Studio

* Published in the proceedings of the 2nd Workshop on When Creative AI Meets Conversational AI (CAI2), COLING 2022, 6 pages, System Demonstration Paper

Via

Access Paper or Ask Questions

NGBoost: Natural Gradient Boosting for Probabilistic Prediction

Oct 09, 2019

Tony Duan, Anand Avati, Daisy Yi Ding, Sanjay Basu, Andrew Y. Ng, Alejandro Schuler

Figure 1 for NGBoost: Natural Gradient Boosting for Probabilistic Prediction

Figure 2 for NGBoost: Natural Gradient Boosting for Probabilistic Prediction

Figure 3 for NGBoost: Natural Gradient Boosting for Probabilistic Prediction

Figure 4 for NGBoost: Natural Gradient Boosting for Probabilistic Prediction

Abstract:We present Natural Gradient Boosting (NGBoost), an algorithm which brings probabilistic prediction capability to gradient boosting in a generic way. Predictive uncertainty estimation is crucial in many applications such as healthcare and weather forecasting. Probabilistic prediction, which is the approach where the model outputs a full probability distribution over the entire outcome space, is a natural way to quantify those uncertainties. Gradient Boosting Machines have been widely successful in prediction tasks on structured input data, but a simple boosting solution for probabilistic prediction of real valued outputs is yet to be made. NGBoost is a gradient boosting approach which uses the \emph{Natural Gradient} to address technical challenges that makes generic probabilistic prediction hard with existing gradient boosting methods. Our approach is modular with respect to the choice of base learner, probability distribution, and scoring rule. We show empirically on several regression datasets that NGBoost provides competitive predictive performance of both uncertainty estimates and traditional metrics.

Via

Access Paper or Ask Questions

Forecasting Internally Displaced Population Migration Patterns in Syria and Yemen

Jun 22, 2018

Benjamin Q. Huynh, Sanjay Basu

Figure 1 for Forecasting Internally Displaced Population Migration Patterns in Syria and Yemen

Figure 2 for Forecasting Internally Displaced Population Migration Patterns in Syria and Yemen

Figure 3 for Forecasting Internally Displaced Population Migration Patterns in Syria and Yemen

Figure 4 for Forecasting Internally Displaced Population Migration Patterns in Syria and Yemen

Abstract:Armed conflict has led to an unprecedented number of internally displaced persons (IDPs) - individuals who are forced out of their homes but remain within their country. IDPs often urgently require shelter, food, and healthcare, yet prediction of when large fluxes of IDPs will cross into an area remains a major challenge for aid delivery organizations. Accurate forecasting of IDP migration would empower humanitarian aid groups to more effectively allocate resources during conflicts. We show that monthly flow of IDPs from province to province in both Syria and Yemen can be accurately forecasted one month in advance, using publicly available data. We model monthly IDP flow using data on food price, fuel price, wage, geospatial, and news data. We find that machine learning approaches can more accurately forecast migration trends than baseline persistence models. Our findings thus potentially enable proactive aid allocation for IDPs in anticipation of forecasted arrivals.

Via

Access Paper or Ask Questions