Picture for Huan Sun

Huan Sun

Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving

Add code
Nov 11, 2024
Figure 1 for Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving
Figure 2 for Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving
Figure 3 for Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving
Figure 4 for Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving
Viaarxiv icon

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Add code
Nov 10, 2024
Figure 1 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 2 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 3 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 4 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Viaarxiv icon

AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts

Add code
Oct 29, 2024
Viaarxiv icon

AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents

Add code
Oct 22, 2024
Figure 1 for AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
Figure 2 for AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
Figure 3 for AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
Figure 4 for AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
Viaarxiv icon

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Add code
Oct 14, 2024
Figure 1 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 2 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 3 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Figure 4 for AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Viaarxiv icon

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Add code
Oct 07, 2024
Figure 1 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 2 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 3 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 4 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Viaarxiv icon

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Add code
Oct 07, 2024
Figure 1 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 2 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 3 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 4 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Viaarxiv icon

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

Add code
Sep 17, 2024
Figure 1 for EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Figure 2 for EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Figure 3 for EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Figure 4 for EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Viaarxiv icon

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Add code
Sep 04, 2024
Figure 1 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 2 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 3 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 4 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Viaarxiv icon

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Add code
May 27, 2024
Figure 1 for Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Figure 2 for Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Figure 3 for Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Figure 4 for Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Viaarxiv icon