Picture for Irina Rish

Irina Rish

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

Add code
Mar 06, 2025
Viaarxiv icon

Continual Pre-training of MoEs: How robust is your router?

Add code
Mar 06, 2025
Viaarxiv icon

Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark

Add code
Jan 16, 2025
Figure 1 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Figure 2 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Figure 3 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Figure 4 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Viaarxiv icon

Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Add code
Dec 18, 2024
Figure 1 for Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Figure 2 for Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Figure 3 for Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Figure 4 for Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Viaarxiv icon

RedPajama: an Open Dataset for Training Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Add code
Nov 11, 2024
Figure 1 for Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Figure 2 for Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Figure 3 for Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Figure 4 for Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Viaarxiv icon

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

Add code
Nov 04, 2024
Figure 1 for Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Figure 2 for Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Figure 3 for Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Figure 4 for Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Viaarxiv icon

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Add code
Oct 24, 2024
Figure 1 for Context is Key: A Benchmark for Forecasting with Essential Textual Information
Figure 2 for Context is Key: A Benchmark for Forecasting with Essential Textual Information
Figure 3 for Context is Key: A Benchmark for Forecasting with Essential Textual Information
Figure 4 for Context is Key: A Benchmark for Forecasting with Essential Textual Information
Viaarxiv icon

VFA: Vision Frequency Analysis of Foundation Models and Human

Add code
Sep 09, 2024
Figure 1 for VFA: Vision Frequency Analysis of Foundation Models and Human
Figure 2 for VFA: Vision Frequency Analysis of Foundation Models and Human
Figure 3 for VFA: Vision Frequency Analysis of Foundation Models and Human
Figure 4 for VFA: Vision Frequency Analysis of Foundation Models and Human
Viaarxiv icon

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

Add code
Jul 17, 2024
Viaarxiv icon