Picture for Marzena Karpinska

Marzena Karpinska

OverThink: Slowdown Attacks on Reasoning LLMs

Add code
Feb 05, 2025
Viaarxiv icon

OVERTHINKING: Slowdown Attacks on Reasoning LLMs

Add code
Feb 04, 2025
Viaarxiv icon

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text

Add code
Jan 26, 2025
Viaarxiv icon

Preliminary WMT24 Ranking of General MT Systems and LLMs

Add code
Jul 29, 2024
Figure 1 for Preliminary WMT24 Ranking of General MT Systems and LLMs
Figure 2 for Preliminary WMT24 Ranking of General MT Systems and LLMs
Figure 3 for Preliminary WMT24 Ranking of General MT Systems and LLMs
Figure 4 for Preliminary WMT24 Ranking of General MT Systems and LLMs
Viaarxiv icon

CaLMQA: Exploring culturally specific long-form question answering across 23 languages

Add code
Jun 25, 2024
Figure 1 for CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Figure 2 for CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Figure 3 for CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Figure 4 for CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Viaarxiv icon

One Thousand and One Pairs: A "novel" challenge for long-context language models

Add code
Jun 24, 2024
Viaarxiv icon

Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation

Add code
Jun 17, 2024
Figure 1 for Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
Figure 2 for Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
Figure 3 for Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
Figure 4 for Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
Viaarxiv icon

FABLES: Evaluating faithfulness and content selection in book-length summarization

Add code
Apr 01, 2024
Figure 1 for FABLES: Evaluating faithfulness and content selection in book-length summarization
Figure 2 for FABLES: Evaluating faithfulness and content selection in book-length summarization
Figure 3 for FABLES: Evaluating faithfulness and content selection in book-length summarization
Figure 4 for FABLES: Evaluating faithfulness and content selection in book-length summarization
Viaarxiv icon

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Add code
Mar 30, 2024
Figure 1 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 2 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 3 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 4 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Viaarxiv icon

Large language models effectively leverage document-level context for literary translation, but critical errors persist

Add code
Apr 07, 2023
Figure 1 for Large language models effectively leverage document-level context for literary translation, but critical errors persist
Figure 2 for Large language models effectively leverage document-level context for literary translation, but critical errors persist
Figure 3 for Large language models effectively leverage document-level context for literary translation, but critical errors persist
Figure 4 for Large language models effectively leverage document-level context for literary translation, but critical errors persist
Viaarxiv icon