Picture for Bryan Catanzaro

Bryan Catanzaro

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Add code
Mar 06, 2025
Viaarxiv icon

FeatSharp: Your Vision Model Features, Sharper

Add code
Feb 22, 2025
Viaarxiv icon

A2SB: Audio-to-Audio Schrodinger Bridges

Add code
Jan 20, 2025
Viaarxiv icon

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Add code
Dec 30, 2024
Viaarxiv icon

ETTA: Elucidating the Design Space of Text-to-Audio Models

Add code
Dec 26, 2024
Viaarxiv icon

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Add code
Dec 19, 2024
Figure 1 for AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Figure 2 for AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Figure 3 for AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Figure 4 for AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Viaarxiv icon

Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining

Add code
Dec 18, 2024
Figure 1 for Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Figure 2 for Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Figure 3 for Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Figure 4 for Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Viaarxiv icon

RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models

Add code
Dec 10, 2024
Viaarxiv icon

Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset

Add code
Dec 03, 2024
Viaarxiv icon

MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs

Add code
Nov 04, 2024
Viaarxiv icon