Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dmitry Karpov

No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data

Feb 04, 2026

Dmitry Karpov

Abstract:We explore machine translation for five Turkic language pairs: Russian-Bashkir, Russian-Kazakh, Russian-Kyrgyz, English-Tatar, English-Chuvash. Fine-tuning nllb-200-distilled-600M with LoRA on synthetic data achieved chrF++ 49.71 for Kazakh and 46.94 for Bashkir. Prompting DeepSeek-V3.2 with retrieved similar examples achieved chrF++ 39.47 for Chuvash. For Tatar, zero-shot or retrieval-based approaches achieved chrF++ 41.6, while for Kyrgyz the zero-shot approach reached 45.6. We release the dataset and the obtained weights.

* Accepted to EACL 2026 (LoResMT workshop)

Via

Access Paper or Ask Questions

Noise2Ghost: Self-supervised deep convolutional reconstruction for ghost imaging

Apr 14, 2025

Mathieu Manni, Dmitry Karpov, K. Joost Batenburg, Sharon Shwartz, Nicola Viganò

Figure 1 for Noise2Ghost: Self-supervised deep convolutional reconstruction for ghost imaging

Figure 2 for Noise2Ghost: Self-supervised deep convolutional reconstruction for ghost imaging

Figure 3 for Noise2Ghost: Self-supervised deep convolutional reconstruction for ghost imaging

Figure 4 for Noise2Ghost: Self-supervised deep convolutional reconstruction for ghost imaging

Abstract:We present a new self-supervised deep-learning-based Ghost Imaging (GI) reconstruction method, which provides unparalleled reconstruction performance for noisy acquisitions among unsupervised methods. We present the supporting mathematical framework and results from theoretical and real data use cases. Self-supervision removes the need for clean reference data while offering strong noise reduction. This provides the necessary tools for addressing signal-to-noise ratio concerns for GI acquisitions in emerging and cutting-edge low-light GI scenarios. Notable examples include micro- and nano-scale x-ray emission imaging, e.g., x-ray fluorescence imaging of dose-sensitive samples. Their applications include in-vivo and in-operando case studies for biological samples and batteries.

Via

Access Paper or Ask Questions

Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification

Jul 04, 2023

Dmitry Karpov, Mikhail Burtsev

Abstract:This article investigates the knowledge transfer from the RuQTopics dataset. This Russian topical dataset combines a large sample number (361,560 single-label, 170,930 multi-label) with extensive class coverage (76 classes). We have prepared this dataset from the "Yandex Que" raw data. By evaluating the RuQTopics - trained models on the six matching classes of the Russian MASSIVE subset, we have proved that the RuQTopics dataset is suitable for real-world conversational tasks, as the Russian-only models trained on this dataset consistently yield an accuracy around 85\% on this subset. We also have figured out that for the multilingual BERT, trained on the RuQTopics and evaluated on the same six classes of MASSIVE (for all MASSIVE languages), the language-wise accuracy closely correlates (Spearman correlation 0.773 with p-value 2.997e-11) with the approximate size of the pretraining BERT's data for the corresponding language. At the same time, the correlation of the language-wise accuracy with the linguistical distance from Russian is not statistically significant.

Via

Access Paper or Ask Questions