Picture for Jan Kautz

Jan Kautz

NVIDIA

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Add code
Apr 17, 2025
Figure 1 for CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Figure 2 for CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Figure 3 for CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Figure 4 for CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Viaarxiv icon

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

Add code
Apr 15, 2025
Viaarxiv icon

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Add code
Apr 10, 2025
Figure 1 for Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Figure 2 for Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Figure 3 for Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Figure 4 for Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Viaarxiv icon

One-Minute Video Generation with Test-Time Training

Add code
Apr 07, 2025
Figure 1 for One-Minute Video Generation with Test-Time Training
Figure 2 for One-Minute Video Generation with Test-Time Training
Figure 3 for One-Minute Video Generation with Test-Time Training
Figure 4 for One-Minute Video Generation with Test-Time Training
Viaarxiv icon

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

Add code
Apr 06, 2025
Figure 1 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Figure 2 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Figure 3 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Figure 4 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Viaarxiv icon

Scaling Vision Pre-Training to 4K Resolution

Add code
Mar 25, 2025
Viaarxiv icon

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Add code
Mar 18, 2025
Viaarxiv icon

Token-Efficient Long Video Understanding for Multimodal LLMs

Add code
Mar 06, 2025
Figure 1 for Token-Efficient Long Video Understanding for Multimodal LLMs
Figure 2 for Token-Efficient Long Video Understanding for Multimodal LLMs
Figure 3 for Token-Efficient Long Video Understanding for Multimodal LLMs
Figure 4 for Token-Efficient Long Video Understanding for Multimodal LLMs
Viaarxiv icon

FeatSharp: Your Vision Model Features, Sharper

Add code
Feb 22, 2025
Viaarxiv icon

QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation

Add code
Feb 07, 2025
Viaarxiv icon