Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arda Yüksel

TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Jul 17, 2024

Arda Yüksel, Abdullatif Köksal, Lütfi Kerem Şenel, Anna Korhonen, Hinrich Schütze

Figure 1 for TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Figure 2 for TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Figure 3 for TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Figure 4 for TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Abstract:Multiple choice question answering tasks evaluate the reasoning, comprehension, and mathematical abilities of Large Language Models (LLMs). While existing benchmarks employ automatic translation for multilingual evaluation, this approach is error-prone and potentially introduces culturally biased questions, especially in social sciences. We introduce the first multitask, multiple-choice Turkish QA benchmark, TurkishMMLU, to evaluate LLMs' understanding of the Turkish language. TurkishMMLU includes over 10,000 questions, covering 9 different subjects from Turkish high-school education curricula. These questions are written by curriculum experts, suitable for the high-school curricula in Turkey, covering subjects ranging from natural sciences and math questions to more culturally representative topics such as Turkish Literature and the history of the Turkish Republic. We evaluate over 20 LLMs, including multilingual open-source (e.g., Gemma, Llama, MT5), closed-source (GPT 4o, Claude, Gemini), and Turkish-adapted (e.g., Trendyol) models. We provide an extensive evaluation, including zero-shot and few-shot evaluation of LLMs, chain-of-thought reasoning, and question difficulty analysis along with model performance. We provide an in-depth analysis of the Turkish capabilities and limitations of current LLMs to provide insights for future LLMs for the Turkish language. We publicly release our code for the dataset and evaluation: https://github.com/ArdaYueksel/TurkishMMLU.

Via

Access Paper or Ask Questions

Butterfly Effect Attack: Tiny and Seemingly Unrelated Perturbations for Object Detection

Nov 14, 2022

Nguyen Anh Vu Doan, Arda Yüksel, Chih-Hong Cheng

Figure 1 for Butterfly Effect Attack: Tiny and Seemingly Unrelated Perturbations for Object Detection

Figure 2 for Butterfly Effect Attack: Tiny and Seemingly Unrelated Perturbations for Object Detection

Figure 3 for Butterfly Effect Attack: Tiny and Seemingly Unrelated Perturbations for Object Detection

Figure 4 for Butterfly Effect Attack: Tiny and Seemingly Unrelated Perturbations for Object Detection

Abstract:This work aims to explore and identify tiny and seemingly unrelated perturbations of images in object detection that will lead to performance degradation. While tininess can naturally be defined using $L_p$ norms, we characterize the degree of "unrelatedness" of an object by the pixel distance between the occurred perturbation and the object. Triggering errors in prediction while satisfying two objectives can be formulated as a multi-objective optimization problem where we utilize genetic algorithms to guide the search. The result successfully demonstrates that (invisible) perturbations on the right part of the image can drastically change the outcome of object detection on the left. An extensive evaluation reaffirms our conjecture that transformer-based object detection networks are more susceptible to butterfly effects in comparison to single-stage object detection networks such as YOLOv5.

Via

Access Paper or Ask Questions