Picture for Jeffrey Flanigan

Jeffrey Flanigan

OSGuard: A Benchmark for Safety in Computer-Use Agents

Add code
Jun 13, 2026
Viaarxiv icon

Right or Wrong, Models Comply: Directional Blindness in LLM Moral Judgment

Add code
Jun 12, 2026
Viaarxiv icon

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

Add code
Jun 12, 2026
Viaarxiv icon

MathAtlas: A Benchmark for Autoformalization in the Wild

Add code
May 13, 2026
Viaarxiv icon

Prompt, Translate, Fine-Tune, Re-Initialize, or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages

Add code
Jun 23, 2025
Figure 1 for Prompt, Translate, Fine-Tune, Re-Initialize, or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages
Figure 2 for Prompt, Translate, Fine-Tune, Re-Initialize, or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages
Figure 3 for Prompt, Translate, Fine-Tune, Re-Initialize, or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages
Figure 4 for Prompt, Translate, Fine-Tune, Re-Initialize, or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages
Viaarxiv icon

RAC: Efficient LLM Factuality Correction with Retrieval Augmentation

Add code
Oct 21, 2024
Figure 1 for RAC: Efficient LLM Factuality Correction with Retrieval Augmentation
Figure 2 for RAC: Efficient LLM Factuality Correction with Retrieval Augmentation
Figure 3 for RAC: Efficient LLM Factuality Correction with Retrieval Augmentation
Figure 4 for RAC: Efficient LLM Factuality Correction with Retrieval Augmentation
Viaarxiv icon

Large Language Model Unlearning via Embedding-Corrupted Prompts

Add code
Jun 12, 2024
Figure 1 for Large Language Model Unlearning via Embedding-Corrupted Prompts
Figure 2 for Large Language Model Unlearning via Embedding-Corrupted Prompts
Figure 3 for Large Language Model Unlearning via Embedding-Corrupted Prompts
Figure 4 for Large Language Model Unlearning via Embedding-Corrupted Prompts
Viaarxiv icon

The Power of the Noisy Channel: Unsupervised End-to-End Task-Oriented Dialogue with LLMs

Add code
Apr 23, 2024
Figure 1 for The Power of the Noisy Channel: Unsupervised End-to-End Task-Oriented Dialogue with LLMs
Figure 2 for The Power of the Noisy Channel: Unsupervised End-to-End Task-Oriented Dialogue with LLMs
Figure 3 for The Power of the Noisy Channel: Unsupervised End-to-End Task-Oriented Dialogue with LLMs
Figure 4 for The Power of the Noisy Channel: Unsupervised End-to-End Task-Oriented Dialogue with LLMs
Viaarxiv icon

Future Language Modeling from Temporal Document History

Add code
Apr 16, 2024
Viaarxiv icon

Task Contamination: Language Models May Not Be Few-Shot Anymore

Add code
Dec 26, 2023
Figure 1 for Task Contamination: Language Models May Not Be Few-Shot Anymore
Figure 2 for Task Contamination: Language Models May Not Be Few-Shot Anymore
Figure 3 for Task Contamination: Language Models May Not Be Few-Shot Anymore
Figure 4 for Task Contamination: Language Models May Not Be Few-Shot Anymore
Viaarxiv icon