Picture for Narutatsu Ri

Narutatsu Ri

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

Add code
Feb 06, 2025
Figure 1 for Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
Figure 2 for Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
Figure 3 for Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
Figure 4 for Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
Viaarxiv icon

Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution

Add code
Sep 11, 2024
Figure 1 for Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution
Figure 2 for Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution
Figure 3 for Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution
Figure 4 for Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution
Viaarxiv icon

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Add code
Jul 17, 2023
Viaarxiv icon

Contrastive Loss is All You Need to Recover Analogies as Parallel Lines

Add code
Jun 14, 2023
Viaarxiv icon

IdEALS: Idiomatic Expressions for Advancement of Language Skills

Add code
May 24, 2023
Viaarxiv icon

Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies

Add code
May 21, 2023
Viaarxiv icon