Picture for Yarin Gal

Yarin Gal

Open Problems in Machine Unlearning for AI Safety

Add code
Jan 09, 2025
Viaarxiv icon

Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts

Add code
Dec 13, 2024
Figure 1 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Figure 2 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Figure 3 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Figure 4 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Viaarxiv icon

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Add code
Oct 11, 2024
Viaarxiv icon

Temporal-Difference Variational Continual Learning

Add code
Oct 10, 2024
Figure 1 for Temporal-Difference Variational Continual Learning
Figure 2 for Temporal-Difference Variational Continual Learning
Figure 3 for Temporal-Difference Variational Continual Learning
Figure 4 for Temporal-Difference Variational Continual Learning
Viaarxiv icon

TextCAVs: Debugging vision models using text

Add code
Aug 16, 2024
Viaarxiv icon

Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks

Add code
Aug 10, 2024
Figure 1 for Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks
Figure 2 for Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks
Figure 3 for Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks
Figure 4 for Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks
Viaarxiv icon

Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

Add code
Jun 22, 2024
Viaarxiv icon

The Benefits and Risks of Transductive Approaches for AI Fairness

Add code
Jun 17, 2024
Viaarxiv icon

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Add code
Jun 14, 2024
Figure 1 for Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Figure 2 for Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Figure 3 for Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Figure 4 for Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Viaarxiv icon

Estimating the Hallucination Rate of Generative AI

Add code
Jun 11, 2024
Figure 1 for Estimating the Hallucination Rate of Generative AI
Figure 2 for Estimating the Hallucination Rate of Generative AI
Figure 3 for Estimating the Hallucination Rate of Generative AI
Figure 4 for Estimating the Hallucination Rate of Generative AI
Viaarxiv icon