Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Harsh Gupta

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Jan 15, 2026

Aaron Adcock, Aayushi Srivastava, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pande, Abhinav Pandey, Abhinav Sharma, Abhishek Kadian, Abhishek Kumawat, Adam Kelsey(+1295 more)

Abstract:This document consolidates publicly reported technical details about Metas Llama 4 model family. It summarizes (i) released variants (Scout and Maverick) and the broader herd context including the previewed Behemoth teacher model, (ii) architectural characteristics beyond a high-level MoE description covering routed/shared-expert structure, early-fusion multimodality, and long-context design elements reported for Scout (iRoPE and length generalization strategies), (iii) training disclosures spanning pre-training, mid-training for long-context extension, and post-training methodology (lightweight SFT, online RL, and lightweight DPO) as described in release materials, (iv) developer-reported benchmark results for both base and instruction-tuned checkpoints, and (v) practical deployment constraints observed across major serving environments, including provider-specific context limits and quantization packaging. The manuscript also summarizes licensing obligations relevant to redistribution and derivative naming, and reviews publicly described safeguards and evaluation practices. The goal is to provide a compact technical reference for researchers and practitioners who need precise, source-backed facts about Llama 4.

* 15 pages

Via

Access Paper or Ask Questions

Sensor-Invariant Tactile Representation

Feb 27, 2025

Harsh Gupta, Yuchen Mo, Shengmiao Jin, Wenzhen Yuan

Abstract:High-resolution tactile sensors have become critical for embodied perception and robotic manipulation. However, a key challenge in the field is the lack of transferability between sensors due to design and manufacturing variations, which result in significant differences in tactile signals. This limitation hinders the ability to transfer models or knowledge learned from one sensor to another. To address this, we introduce a novel method for extracting Sensor-Invariant Tactile Representations (SITR), enabling zero-shot transfer across optical tactile sensors. Our approach utilizes a transformer-based architecture trained on a diverse dataset of simulated sensor designs, allowing it to generalize to new sensors in the real world with minimal calibration. Experimental results demonstrate the method's effectiveness across various tactile sensing applications, facilitating data and model transferability for future advancements in the field.

* Accepted to ICLR'25

Via

Access Paper or Ask Questions

Text2Place: Affordance-aware Text Guided Human Placement

Jul 22, 2024

Rishubh Parihar, Harsh Gupta, Sachidanand VS, R. Venkatesh Babu

Abstract:For a given scene, humans can easily reason for the locations and pose to place objects. Designing a computational model to reason about these affordances poses a significant challenge, mirroring the intuitive reasoning abilities of humans. This work tackles the problem of realistic human insertion in a given background scene termed as \textbf{Semantic Human Placement}. This task is extremely challenging given the diverse backgrounds, scale, and pose of the generated person and, finally, the identity preservation of the person. We divide the problem into the following two stages \textbf{i)} learning \textit{semantic masks} using text guidance for localizing regions in the image to place humans and \textbf{ii)} subject-conditioned inpainting to place a given subject adhering to the scene affordance within the \textit{semantic masks}. For learning semantic masks, we leverage rich object-scene priors learned from the text-to-image generative models and optimize a novel parameterization of the semantic mask, eliminating the need for large-scale training. To the best of our knowledge, we are the first ones to provide an effective solution for realistic human placements in diverse real-world scenes. The proposed method can generate highly realistic scene compositions while preserving the background and subject identity. Further, we present results for several downstream tasks - scene hallucination from a single or multiple generated persons and text-based attribute editing. With extensive comparisons against strong baselines, we show the superiority of our method in realistic human placement.

* ECCV 2024, Project Page: https://rishubhpar.github.io/Text2Place/

Via

Access Paper or Ask Questions

Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images

Oct 04, 2022

Sushant Lenka, Pratyush Kerhalkar, Pranav Shetty, Harsh Gupta, Bhavam Vidyarthi, Ujjwal Verma

Figure 1 for Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images

Figure 2 for Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images

Figure 3 for Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images

Figure 4 for Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images

Abstract:Identification of regions affected by floods is a crucial piece of information required for better planning and management of post-disaster relief and rescue efforts. Traditionally, remote sensing images are analysed to identify the extent of damage caused by flooding. The data acquired from sensors onboard earth observation satellites are analyzed to detect the flooded regions, which can be affected by low spatial and temporal resolution. However, in recent years, the images acquired from Unmanned Aerial Vehicles (UAVs) have also been utilized to assess post-disaster damage. Indeed, a UAV based platform can be rapidly deployed with a customized flight plan and minimum dependence on the ground infrastructure. This work proposes two approaches for identifying flooded regions in UAV aerial images. The first approach utilizes texture-based unsupervised segmentation to detect flooded areas, while the second uses an artificial neural network on the texture features to classify images as flooded and non-flooded. Unlike the existing works where the models are trained and tested on images of the same geographical regions, this work studies the performance of the proposed model in identifying flooded regions across geographical regions. An F1-score of 0.89 is obtained using the proposed segmentation-based approach which is higher than existing classifiers. The robustness of the proposed approach demonstrates that it can be utilized to identify flooded regions of any region with minimum or no user intervention.

Via

Access Paper or Ask Questions

Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging

Nov 17, 2020

Joseph Lubars, Harsh Gupta, Adnan Raja, R. Srikant, Liyun Li, Xinzhou Wu

Figure 1 for Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging

Figure 2 for Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging

Figure 3 for Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging

Figure 4 for Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging

Abstract:We consider the problem of designing an algorithm to allow a car to autonomously merge on to a highway from an on-ramp. Two broad classes of techniques have been proposed to solve motion planning problems in autonomous driving: Model Predictive Control (MPC) and Reinforcement Learning (RL). In this paper, we first establish the strengths and weaknesses of state-of-the-art MPC and RL-based techniques through simulations. We show that the performance of the RL agent is worse than that of the MPC solution from the perspective of safety and robustness to out-of-distribution traffic patterns, i.e., traffic patterns which were not seen by the RL agent during training. On the other hand, the performance of the RL agent is better than that of the MPC solution when it comes to efficiency and passenger comfort. We subsequently present an algorithm which blends the model-free RL agent with the MPC solution and show that it provides better trade-offs between all metrics -- passenger comfort, efficiency, crash rate and robustness.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Provably-Efficient Double Q-Learning

Jul 09, 2020

Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant

Figure 1 for Provably-Efficient Double Q-Learning

Figure 2 for Provably-Efficient Double Q-Learning

Figure 3 for Provably-Efficient Double Q-Learning

Abstract:In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided that the optimal policy is unique and the algorithms converge. We show that the asymptotic mean-squared error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and outputs the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.

* 15 pages, 3 figures, 1 table

Via

Access Paper or Ask Questions

Mixed Logit Models and Network Formation

Jun 30, 2020

Harsh Gupta, Mason A. Porter

Figure 1 for Mixed Logit Models and Network Formation

Figure 2 for Mixed Logit Models and Network Formation

Figure 3 for Mixed Logit Models and Network Formation

Figure 4 for Mixed Logit Models and Network Formation

Abstract:The study of network formation is pervasive in economics, sociology, and many other fields. In this paper, we model network formation as a ``choice'' that is made by nodes in a network to connect to other nodes. We study these ``choices'' using discrete-choice models, in which an agent chooses between two or more discrete alternatives. One framework for studying network formation is the multinomial logit (MNL) model. We highlight limitations of the MNL model on networks that are constructed from empirical data. We employ the ``repeated choice'' (RC) model to study network formation \cite{TrainRevelt97mixedlogit}. We argue that the RC model overcomes important limitations of the MNL model and is well-suited to study network formation. We also illustrate how to use the RC model to accurately study network formation using both synthetic and real-world networks. Using synthetic networks, we also compare the performance of the MNL model and the RC model; we find that the RC model estimates the data-generation process of our synthetic networks more accurately than the MNL model. We provide examples of qualitatively interesting questions -- the presence of homophily in a teen friendship network and the fact that new patents are more likely to cite older, more cited, and similar patents -- for which the RC model allows us to achieve insights.

Via

Access Paper or Ask Questions

Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

Jul 14, 2019

Harsh Gupta, R. Srikant, Lei Ying

Figure 1 for Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

Figure 2 for Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

Abstract:We study two time-scale linear stochastic approximation algorithms, which can be used to model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC. We present finite-time performance bounds for the case where the learning rate is fixed. The key idea in obtaining these bounds is to use a Lyapunov function motivated by singular perturbation theory for linear differential equations. We use the bound to design an adaptive learning rate scheme which significantly improves the convergence rate over the known optimal polynomial decay rule in our experiments, and can be used to potentially improve the performance of any other schedule where the learning rate is changed at pre-determined time instants.

* 17 pages, 3 figures, preprint

Via

Access Paper or Ask Questions

Almost Boltzmann Exploration

Jan 25, 2019

Harsh Gupta, Seo Taek Kong, R. Srikant, Weina Wang

Figure 1 for Almost Boltzmann Exploration

Figure 2 for Almost Boltzmann Exploration

Figure 3 for Almost Boltzmann Exploration

Figure 4 for Almost Boltzmann Exploration

Abstract:Boltzmann exploration is widely used in reinforcement learning to provide a trade-off between exploration and exploitation. Recently, in (Cesa-Bianchi et al., 2017) it has been shown that pure Boltzmann exploration does not perform well from a regret perspective, even in the simplest setting of stochastic multi-armed bandit (MAB) problems. In this paper, we show that a simple modification to Boltzmann exploration, motivated by a variation of the standard doubling trick, achieves $O(K\log^{1+\alpha} T)$ regret for a stochastic MAB problem with $K$ arms, where $\alpha>0$ is a parameter of the algorithm. This improves on the result in (Cesa-Bianchi et al., 2017), where an algorithm inspired by the Gumbel-softmax trick achieves $O(K\log^2 T)$ regret. We also show that our algorithm achieves $O(\beta(G) \log^{1+\alpha} T)$ regret in stochastic MAB problems with graph-structured feedback, without knowledge of the graph structure, where $\beta(G)$ is the independence number of the feedback graph. Additionally, we present extensive experimental results on real datasets and applications for multi-armed bandits with both traditional bandit feedback and graph-structured feedback. In all cases, our algorithm performs as well or better than the state-of-the-art.

* 12 pages, 14 figures

Via

Access Paper or Ask Questions

Multimodal Content Analysis for Effective Advertisements on YouTube

Sep 12, 2017

Nikhita Vedula, Wei Sun, Hyunhwan Lee, Harsh Gupta, Mitsunori Ogihara, Joseph Johnson, Gang Ren, Srinivasan Parthasarathy

Figure 1 for Multimodal Content Analysis for Effective Advertisements on YouTube

Figure 2 for Multimodal Content Analysis for Effective Advertisements on YouTube

Figure 3 for Multimodal Content Analysis for Effective Advertisements on YouTube

Figure 4 for Multimodal Content Analysis for Effective Advertisements on YouTube

Abstract:The rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform.

* 11 pages, 5 figures, ICDM 2017

Via

Access Paper or Ask Questions