Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Amir Jalalirad

Dynamic Tool Dependency Retrieval for Efficient Function Calling

Dec 23, 2025

Bhrij Patel, Davide Belli, Amir Jalalirad, Maximilian Arnold, Aleksandr Ermolov, Bence Major

Figure 1 for Dynamic Tool Dependency Retrieval for Efficient Function Calling

Figure 2 for Dynamic Tool Dependency Retrieval for Efficient Function Calling

Figure 3 for Dynamic Tool Dependency Retrieval for Efficient Function Calling

Figure 4 for Dynamic Tool Dependency Retrieval for Efficient Function Calling

Abstract:Function calling agents powered by Large Language Models (LLMs) select external tools to automate complex tasks. On-device agents typically use a retrieval module to select relevant tools, improving performance and reducing context length. However, existing retrieval methods rely on static and limited inputs, failing to capture multi-step tool dependencies and evolving task context. This limitation often introduces irrelevant tools that mislead the agent, degrading efficiency and accuracy. We propose Dynamic Tool Dependency Retrieval (DTDR), a lightweight retrieval method that conditions on both the initial query and the evolving execution context. DTDR models tool dependencies from function calling demonstrations, enabling adaptive retrieval as plans unfold. We benchmark DTDR against state-of-the-art retrieval methods across multiple datasets and LLM backbones, evaluating retrieval precision, downstream task accuracy, and computational efficiency. Additionally, we explore strategies to integrate retrieved tools into prompts. Our results show that dynamic tool retrieval improves function calling success rates between $23\%$ and $104\%$ compared to state-of-the-art static retrievers.

* 18 pages, 5 figures, 6 tables

Via

Access Paper or Ask Questions

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Dec 02, 2024

Marco Federici, Davide Belli, Mart van Baalen, Amir Jalalirad, Andrii Skliar, Bence Major, Markus Nagel, Paul Whatmough

Figure 1 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Figure 2 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Figure 3 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Figure 4 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Abstract:While mobile devices provide ever more compute power, improvements in DRAM bandwidth are much slower. This is unfortunate for large language model (LLM) token generation, which is heavily memory-bound. Previous work has proposed to leverage natural dynamic activation sparsity in ReLU-activated LLMs to reduce effective DRAM bandwidth per token. However, more recent LLMs use SwiGLU instead of ReLU, which result in little inherent sparsity. While SwiGLU activations can be pruned based on magnitude, the resulting sparsity patterns are difficult to predict, rendering previous approaches ineffective. To circumvent this issue, our work introduces Dynamic Input Pruning (DIP): a predictor-free dynamic sparsification approach, which preserves accuracy with minimal fine-tuning. DIP can further use lightweight LoRA adapters to regain some performance lost during sparsification. Lastly, we describe a novel cache-aware masking strategy, which considers the cache state and activation magnitude to further increase cache hit rate, improving LLM token rate on mobile devices. DIP outperforms other methods in terms of accuracy, memory and throughput trade-offs across simulated hardware settings. On Phi-3-Medium, DIP achieves a 46% reduction in memory and 40% increase in throughput with $<$ 0.1 loss in perplexity.

* Main Text: 10 pages, 11 figures. Appendix: 3 pages, 3 figures

Via

Access Paper or Ask Questions

GNSS Positioning using Cost Function Regulated Multilateration and Graph Neural Networks

Feb 28, 2024

Amir Jalalirad, Davide Belli, Bence Major, Songwon Jee, Himanshu Shah, Will Morrison

Figure 1 for GNSS Positioning using Cost Function Regulated Multilateration and Graph Neural Networks

Figure 2 for GNSS Positioning using Cost Function Regulated Multilateration and Graph Neural Networks

Figure 3 for GNSS Positioning using Cost Function Regulated Multilateration and Graph Neural Networks

Figure 4 for GNSS Positioning using Cost Function Regulated Multilateration and Graph Neural Networks

Abstract:In urban environments, where line-of-sight signals from GNSS satellites are frequently blocked by high-rise objects, GNSS receivers are subject to large errors in measuring satellite ranges. Heuristic methods are commonly used to estimate these errors and reduce the impact of noisy measurements on localization accuracy. In our work, we replace these error estimation heuristics with a deep learning model based on Graph Neural Networks. Additionally, by analyzing the cost function of the multilateration process, we derive an optimal method to utilize the estimated errors. Our approach guarantees that the multilateration converges to the receiver's location as the error estimation accuracy increases. We evaluate our solution on a real-world dataset containing more than 100k GNSS epochs, collected from multiple cities with diverse characteristics. The empirical results show improvements from 40% to 80% in the horizontal localization error against recent deep learning baselines as well as classical localization approaches.

* Published in The Proceedings of the Institute of Navigation GNSS+ 2023

Via

Access Paper or Ask Questions