Picture for Terry Yue Zhuo

Terry Yue Zhuo

To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack

Add code
Feb 01, 2026
Viaarxiv icon

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

An Empirical Study of Vulnerabilities in Python Packages and Their Detection

Add code
Sep 04, 2025
Viaarxiv icon

EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code

Add code
May 19, 2025
Figure 1 for EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Figure 2 for EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Figure 3 for EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Figure 4 for EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Viaarxiv icon

Less is More: Towards Green Code Large Language Models via Unified Structural Pruning

Add code
Dec 20, 2024
Viaarxiv icon

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Add code
Jun 26, 2024
Figure 1 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 2 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 3 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 4 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Viaarxiv icon

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

Add code
Apr 23, 2024
Figure 1 for XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Figure 2 for XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Figure 3 for XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Figure 4 for XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Viaarxiv icon

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Add code
Mar 30, 2024
Figure 1 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 2 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 3 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 4 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Figure 1 for StarCoder 2 and The Stack v2: The Next Generation
Figure 2 for StarCoder 2 and The Stack v2: The Next Generation
Figure 3 for StarCoder 2 and The Stack v2: The Next Generation
Figure 4 for StarCoder 2 and The Stack v2: The Next Generation
Viaarxiv icon

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

Add code
Jan 01, 2024
Viaarxiv icon