Picture for Moshe Wasserblat

Moshe Wasserblat

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

Add code
Feb 13, 2025
Viaarxiv icon

HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly

Add code
Oct 03, 2024
Figure 1 for HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Figure 2 for HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Figure 3 for HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Figure 4 for HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Viaarxiv icon

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Add code
Aug 05, 2024
Figure 1 for RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
Figure 2 for RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
Viaarxiv icon

Distributed Speculative Inference of Large Language Models

Add code
May 23, 2024
Viaarxiv icon

Accelerating Speculative Decoding using Dynamic Speculation Length

Add code
May 07, 2024
Figure 1 for Accelerating Speculative Decoding using Dynamic Speculation Length
Figure 2 for Accelerating Speculative Decoding using Dynamic Speculation Length
Figure 3 for Accelerating Speculative Decoding using Dynamic Speculation Length
Figure 4 for Accelerating Speculative Decoding using Dynamic Speculation Length
Viaarxiv icon

CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity

Add code
Apr 16, 2024
Viaarxiv icon

Optimizing Retrieval-augmented Reader Models via Token Elimination

Add code
Oct 20, 2023
Figure 1 for Optimizing Retrieval-augmented Reader Models via Token Elimination
Figure 2 for Optimizing Retrieval-augmented Reader Models via Token Elimination
Figure 3 for Optimizing Retrieval-augmented Reader Models via Token Elimination
Figure 4 for Optimizing Retrieval-augmented Reader Models via Token Elimination
Viaarxiv icon

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

Add code
Jun 28, 2023
Viaarxiv icon

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Add code
Oct 31, 2022
Viaarxiv icon

Fast DistilBERT on CPUs

Add code
Oct 27, 2022
Viaarxiv icon