Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jitin Singla

On Learning with LAD

Sep 28, 2023

C. A. Jothishwaran, Biplav Srivastava, Jitin Singla, Sugata Gangopadhyay

Abstract:The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervonenkis dimension (VC dimension) for LAD models where hypothesis sets consist of DNFs with a small number of cubic monomials. We illustrate and confirm our observations empirically.

Via

Access Paper or Ask Questions

Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation

May 23, 2023

Ayush Maheshwari, Ashim Gupta, Amrith Krishna, Ganesh Ramakrishnan, G. Anil Kumar, Jitin Singla

Figure 1 for Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation

Figure 2 for Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation

Abstract:Sanskrit is a low-resource language with a rich heritage. Digitized Sanskrit corpora reflective of the contemporary usage of Sanskrit, specifically that too in prose, is heavily under-represented at present. Presently, no such English-Sanskrit parallel dataset is publicly available. We release a dataset, S\={a}mayik, of more than 42,000 parallel English-Sanskrit sentences, from four different corpora that aim to bridge this gap. Moreover, we also release benchmarks adapted from existing multilingual pretrained models for Sanskrit-English translation. We include training splits from our contemporary dataset and the Sanskrit-English parallel sentences from the training split of Itih\={a}sa, a previously released classical era machine translation dataset containing Sanskrit.

Via

Access Paper or Ask Questions