Picture for Moshe Wasserblat

Moshe Wasserblat

HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly

Add code
Oct 03, 2024
Viaarxiv icon

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Add code
Aug 05, 2024
Viaarxiv icon

Distributed Speculative Inference of Large Language Models

Add code
May 23, 2024
Viaarxiv icon

Accelerating Speculative Decoding using Dynamic Speculation Length

Add code
May 07, 2024
Viaarxiv icon

CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity

Add code
Apr 16, 2024
Viaarxiv icon

Optimizing Retrieval-augmented Reader Models via Token Elimination

Add code
Oct 20, 2023
Figure 1 for Optimizing Retrieval-augmented Reader Models via Token Elimination
Figure 2 for Optimizing Retrieval-augmented Reader Models via Token Elimination
Figure 3 for Optimizing Retrieval-augmented Reader Models via Token Elimination
Figure 4 for Optimizing Retrieval-augmented Reader Models via Token Elimination
Viaarxiv icon

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

Add code
Jun 28, 2023
Viaarxiv icon

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Add code
Oct 31, 2022
Viaarxiv icon

Fast DistilBERT on CPUs

Add code
Oct 27, 2022
Viaarxiv icon

Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

Add code
Oct 18, 2022
Figure 1 for Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs
Figure 2 for Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs
Figure 3 for Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs
Figure 4 for Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs
Viaarxiv icon