Picture for Jonathan Tu

Jonathan Tu

Understanding the Inner Workings of Language Models Through Representation Dissimilarity

Add code
Oct 23, 2023
Figure 1 for Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Figure 2 for Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Figure 3 for Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Figure 4 for Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Viaarxiv icon

Attributing Learned Concepts in Neural Networks to Training Data

Add code
Oct 06, 2023
Viaarxiv icon

Robustness of edited neural networks

Add code
Feb 28, 2023
Figure 1 for Robustness of edited neural networks
Figure 2 for Robustness of edited neural networks
Figure 3 for Robustness of edited neural networks
Figure 4 for Robustness of edited neural networks
Viaarxiv icon