Picture for Hardik Shah

Hardik Shah

Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability

Add code
Jan 31, 2024
Viaarxiv icon

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Add code
Nov 17, 2023
Viaarxiv icon

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

Add code
Jul 11, 2023
Viaarxiv icon

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates

Add code
Jun 13, 2023
Figure 1 for End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Figure 2 for End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Figure 3 for End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Figure 4 for End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Viaarxiv icon

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

Add code
Mar 31, 2023
Viaarxiv icon

Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

Add code
Nov 08, 2022
Viaarxiv icon

Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks

Add code
Oct 10, 2022
Figure 1 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Figure 2 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Figure 3 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Figure 4 for Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Viaarxiv icon

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

Add code
Oct 09, 2022
Figure 1 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Figure 2 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Figure 3 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Figure 4 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Viaarxiv icon