Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

John Cooper

R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

May 01, 2025

Albert Ge, Tzu-Heng Huang, John Cooper, Avi Trost, Ziyi Chu, Satya Sai Srinath Namburi GNVV, Ziyang Cai, Kendall Park, Nicholas Roberts, Frederic Sala

Figure 1 for R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Figure 2 for R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Figure 3 for R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Figure 4 for R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Abstract:Data mixing strategies have successfully reduced the costs involved in training language models. While promising, such methods suffer from two flaws. First, they rely on predetermined data domains (e.g., data sources, task types), which may fail to capture critical semantic nuances, leaving performance on the table. Second, these methods scale with the number of domains in a computationally prohibitive way. We address these challenges via R&B, a framework that re-partitions training data based on semantic similarity (Regroup) to create finer-grained domains, and efficiently optimizes the data composition (Balance) by leveraging a Gram matrix induced by domain gradients obtained throughout training. Unlike prior works, it removes the need for additional compute to obtain evaluation information such as losses or gradients. We analyze this technique under standard regularity conditions and provide theoretical insights that justify R&B's effectiveness compared to non-adaptive mixing approaches. Empirically, we demonstrate the effectiveness of R&B on five diverse datasets ranging from natural language to reasoning and multimodal tasks. With as little as 0.01% additional compute overhead, R&B matches or exceeds the performance of state-of-the-art data mixing strategies.

Via

Access Paper or Ask Questions

Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization

Apr 14, 2025

Darryl Hannan, John Cooper, Dylan White, Timothy Doster, Henry Kvinge, Yijing Watkins

Figure 1 for Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization

Figure 2 for Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization

Figure 3 for Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization

Figure 4 for Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization

Abstract:Multimodal large language models (MLLMs) have altered the landscape of computer vision, obtaining impressive results across a wide range of tasks, especially in zero-shot settings. Unfortunately, their strong performance does not always transfer to out-of-distribution domains, such as earth observation (EO) imagery. Prior work has demonstrated that MLLMs excel at some EO tasks, such as image captioning and scene understanding, while failing at tasks that require more fine-grained spatial reasoning, such as object localization. However, MLLMs are advancing rapidly and insights quickly become out-dated. In this work, we analyze more recent MLLMs that have been explicitly trained to include fine-grained spatial reasoning capabilities, benchmarking them on EO object localization tasks. We demonstrate that these models are performant in certain settings, making them well suited for zero-shot scenarios. Additionally, we provide a detailed discussion focused on prompt selection, ground sample distance (GSD) optimization, and analyzing failure cases. We hope that this work will prove valuable as others evaluate whether an MLLM is well suited for a given EO localization task and how to optimize it.

* 26 pages, CVPR MORSE Workshop 2025

Via

Access Paper or Ask Questions

Weak-to-Strong Generalization Through the Data-Centric Lens

Dec 05, 2024

Changho Shin, John Cooper, Frederic Sala

Abstract:The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage them to learn, among multiple sources of data, which to query when seeking to maximize overlap density and thereby enhance weak-to-strong generalization. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm. Empirically, we validate the mechanism and the overlap detection algorithm on a wide array of settings.

* 39 pages

Via

Access Paper or Ask Questions

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Oct 08, 2024

Zheyang Xiong, Ziyang Cai, John Cooper, Albert Ge, Vasilis Papageorgiou, Zack Sifakis, Angeliki Giannou, Ziqian Lin, Liu Yang, Saurabh Agarwal(+4 more)

Figure 1 for Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Figure 2 for Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Figure 3 for Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Figure 4 for Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Abstract:Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a single inference call, a capability we term "task superposition". We provide empirical evidence of this phenomenon across various LLM families and scales and show that this phenomenon emerges even if we train the model to in-context learn one task at a time. We offer theoretical explanations that this capability is well within the expressive power of transformers. We also explore how LLMs internally compose task vectors during superposition. Furthermore, we show that larger models can solve more ICL tasks in parallel, and better calibrate their output distribution. Our findings offer insights into the latent capabilities of LLMs, further substantiate the perspective of "LLMs as superposition of simulators", and raise questions about the mechanisms enabling simultaneous task execution.

Via

Access Paper or Ask Questions

MoRe Fine-Tuning with 10x Fewer Parameters

Aug 30, 2024

Wenxuan Tan, Nicholas Roberts, Tzu-Heng Huang, Jitian Zhao, John Cooper, Samuel Guo, Chengyu Duan, Frederic Sala

Figure 1 for MoRe Fine-Tuning with 10x Fewer Parameters

Figure 2 for MoRe Fine-Tuning with 10x Fewer Parameters

Figure 3 for MoRe Fine-Tuning with 10x Fewer Parameters

Figure 4 for MoRe Fine-Tuning with 10x Fewer Parameters

Abstract:Parameter-efficient fine-tuning (PEFT) techniques have unlocked the potential to cheaply and easily specialize large pretrained models. However, the most prominent approaches, like low-rank adapters (LoRA), depend on heuristics or rules-of-thumb for their architectural choices -- potentially limiting their performance for new models and architectures. This limitation suggests that techniques from neural architecture search could be used to obtain optimal adapter architectures, but these are often expensive and difficult to implement. We address this challenge with Monarch Rectangular Fine-tuning (MoRe), a simple framework to search over adapter architectures that relies on the Monarch matrix class. Theoretically, we show that MoRe is more expressive than LoRA. Empirically, our approach is more parameter-efficient and performant than state-of-the-art PEFTs on a range of tasks and models, with as few as 5\% of LoRA's parameters.

Via

Access Paper or Ask Questions

Inverse Kinematics and Sensitivity Minimization of an n-Stack Stewart Platform

Nov 13, 2018

David Balaban, John Cooper, Erik Komendera

Figure 1 for Inverse Kinematics and Sensitivity Minimization of an n-Stack Stewart Platform

Figure 2 for Inverse Kinematics and Sensitivity Minimization of an n-Stack Stewart Platform

Figure 3 for Inverse Kinematics and Sensitivity Minimization of an n-Stack Stewart Platform

Figure 4 for Inverse Kinematics and Sensitivity Minimization of an n-Stack Stewart Platform

Abstract:An autonomous system is presented to solve the problem of in space assembly, which can be used to further the NASA goal of deep space exploration. Of particular interest is the assembly of large truss structures, which requires precise and dexterous movement in a changing environment. A prototype of an autonomous manipulator called "Assemblers" was fabricated from an aggregation of Stewart Platform robots for the purpose of researching autonomous in space assembly capabilities. The forward kinematics for an Assembler is described by the set of translations and rotation angles for each component Stewart Platform, from which the position and orientation of the end effector are simple to calculate. However, selecting inverse kinematic poses, defined by the translations and rotation angles, for the Assembler requires coordination between each Stewart Platform and is an underconstrained non-linear optimization problem. For assembly tasks, it is ideal that the pose selected has the least sensitivity to disturbances possible. A method of sensitivity reduction is proposed by minimizing the Frobenius Norm (FN) of the Jacobian for the forward kinematics. The effectiveness of the FN method will be demonstrated through a Monte Carlo simulation method to model random motion internal to the structure.

Via

Access Paper or Ask Questions