Picture for Karen Hambardzumyan

Karen Hambardzumyan

Robust LLM safeguarding via refusal feature adversarial training

Add code
Sep 30, 2024
Figure 1 for Robust LLM safeguarding via refusal feature adversarial training
Figure 2 for Robust LLM safeguarding via refusal feature adversarial training
Figure 3 for Robust LLM safeguarding via refusal feature adversarial training
Figure 4 for Robust LLM safeguarding via refusal feature adversarial training
Viaarxiv icon

LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

Add code
Apr 10, 2024
Viaarxiv icon

Scaling Laws for Generative Mixed-Modal Language Models

Add code
Jan 10, 2023
Viaarxiv icon

BARTSmiles: Generative Masked Language Models for Molecular Representations

Add code
Nov 29, 2022
Viaarxiv icon

WARP: Word-level Adversarial ReProgramming

Add code
Jan 01, 2021
Figure 1 for WARP: Word-level Adversarial ReProgramming
Figure 2 for WARP: Word-level Adversarial ReProgramming
Figure 3 for WARP: Word-level Adversarial ReProgramming
Viaarxiv icon

Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks

Add code
Sep 10, 2018
Figure 1 for Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks
Figure 2 for Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks
Figure 3 for Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks
Viaarxiv icon

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

Add code
Jun 27, 2018
Viaarxiv icon

Natural Language Inference over Interaction Space: ICLR 2018 Reproducibility Report

Add code
Feb 09, 2018
Figure 1 for Natural Language Inference over Interaction Space: ICLR 2018 Reproducibility Report
Figure 2 for Natural Language Inference over Interaction Space: ICLR 2018 Reproducibility Report
Viaarxiv icon