Picture for Ruoxi Sun

Ruoxi Sun

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Add code
Nov 12, 2024
Figure 1 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 2 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 3 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 4 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Viaarxiv icon

AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems

Add code
Nov 09, 2024
Figure 1 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Figure 2 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Figure 3 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Figure 4 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Viaarxiv icon

Edge Unlearning is Not "on Edge"! An Adaptive Exact Unlearning System on Resource-Constrained Devices

Add code
Oct 15, 2024
Viaarxiv icon

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

Add code
Oct 09, 2024
Viaarxiv icon

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

Add code
Oct 02, 2024
Viaarxiv icon

SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging

Add code
Aug 22, 2024
Viaarxiv icon

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Add code
Jul 16, 2024
Viaarxiv icon

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Add code
Jul 15, 2024
Figure 1 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 2 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 3 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 4 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Viaarxiv icon

Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

Add code
Jun 22, 2024
Viaarxiv icon

On Security Weaknesses and Vulnerabilities in Deep Learning Systems

Add code
Jun 12, 2024
Figure 1 for On Security Weaknesses and Vulnerabilities in Deep Learning Systems
Figure 2 for On Security Weaknesses and Vulnerabilities in Deep Learning Systems
Figure 3 for On Security Weaknesses and Vulnerabilities in Deep Learning Systems
Figure 4 for On Security Weaknesses and Vulnerabilities in Deep Learning Systems
Viaarxiv icon