Picture for Kavosh Asadi

Kavosh Asadi

C-3DPO: Constrained Controlled Classification for Direct Preference Optimization

Add code
Feb 22, 2025
Viaarxiv icon

Adjoint sharding for very long context training of state space models

Add code
Jan 01, 2025
Figure 1 for Adjoint sharding for very long context training of state space models
Figure 2 for Adjoint sharding for very long context training of state space models
Figure 3 for Adjoint sharding for very long context training of state space models
Figure 4 for Adjoint sharding for very long context training of state space models
Viaarxiv icon

Learning the Target Network in Function Space

Add code
Jun 03, 2024
Viaarxiv icon

TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models

Add code
Oct 09, 2023
Viaarxiv icon

TD Convergence: An Optimization Perspective

Add code
Jun 30, 2023
Viaarxiv icon

Resetting the Optimizer in Deep RL: An Empirical Study

Add code
Jun 30, 2023
Viaarxiv icon

Characterizing the Action-Generalization Gap in Deep Q-Learning

Add code
May 11, 2022
Figure 1 for Characterizing the Action-Generalization Gap in Deep Q-Learning
Figure 2 for Characterizing the Action-Generalization Gap in Deep Q-Learning
Figure 3 for Characterizing the Action-Generalization Gap in Deep Q-Learning
Viaarxiv icon

Deep Q-Network with Proximal Iteration

Add code
Dec 10, 2021
Figure 1 for Deep Q-Network with Proximal Iteration
Figure 2 for Deep Q-Network with Proximal Iteration
Figure 3 for Deep Q-Network with Proximal Iteration
Figure 4 for Deep Q-Network with Proximal Iteration
Viaarxiv icon

Coarse-Grained Smoothness for RL in Metric Spaces

Add code
Oct 23, 2021
Figure 1 for Coarse-Grained Smoothness for RL in Metric Spaces
Figure 2 for Coarse-Grained Smoothness for RL in Metric Spaces
Figure 3 for Coarse-Grained Smoothness for RL in Metric Spaces
Figure 4 for Coarse-Grained Smoothness for RL in Metric Spaces
Viaarxiv icon

Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback

Add code
Sep 15, 2021
Figure 1 for Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback
Figure 2 for Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback
Viaarxiv icon