Picture for Tianhao Shen

Tianhao Shen

Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

Add code
Jul 16, 2024
Viaarxiv icon

Planning with Large Language Models for Conversational Agents

Add code
Jul 04, 2024
Viaarxiv icon

DART: Deep Adversarial Automated Red Teaming for LLM Safety

Add code
Jul 04, 2024
Figure 1 for DART: Deep Adversarial Automated Red Teaming for LLM Safety
Figure 2 for DART: Deep Adversarial Automated Red Teaming for LLM Safety
Figure 3 for DART: Deep Adversarial Automated Red Teaming for LLM Safety
Figure 4 for DART: Deep Adversarial Automated Red Teaming for LLM Safety
Viaarxiv icon

IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons

Add code
Jun 26, 2024
Viaarxiv icon

GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

Add code
Jun 24, 2024
Figure 1 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Figure 2 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Figure 3 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Figure 4 for GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
Viaarxiv icon

Benchmark Underestimates the Readiness of Multi-lingual Dialogue Agents

Add code
May 28, 2024
Viaarxiv icon

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Add code
Feb 28, 2024
Viaarxiv icon

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Add code
Feb 25, 2024
Figure 1 for ChatMusician: Understanding and Generating Music Intrinsically with LLM
Figure 2 for ChatMusician: Understanding and Generating Music Intrinsically with LLM
Figure 3 for ChatMusician: Understanding and Generating Music Intrinsically with LLM
Figure 4 for ChatMusician: Understanding and Generating Music Intrinsically with LLM
Viaarxiv icon

RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models

Add code
Dec 26, 2023
Viaarxiv icon

Large Language Model Alignment: A Survey

Add code
Sep 26, 2023
Viaarxiv icon