Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christopher Hoang

Discrete JEPA: Learning Discrete Token Representations without Reconstruction

Jun 17, 2025

Junyeob Baek, Hosung Lee, Christopher Hoang, Mengye Ren, Sungjin Ahn

Abstract:The cornerstone of cognitive intelligence lies in extracting hidden patterns from observations and leveraging these principles to systematically predict future outcomes. However, current image tokenization methods demonstrate significant limitations in tasks requiring symbolic abstraction and logical reasoning capabilities essential for systematic inference. To address this challenge, we propose Discrete-JEPA, extending the latent predictive coding framework with semantic tokenization and novel complementary objectives to create robust tokenization for symbolic reasoning tasks. Discrete-JEPA dramatically outperforms baselines on visual symbolic prediction tasks, while striking visual evidence reveals the spontaneous emergence of deliberate systematic patterns within the learned semantic token space. Though an initial model, our approach promises a significant impact for advancing Symbolic world modeling and planning capabilities in artificial intelligence systems.

Via

Access Paper or Ask Questions

PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Aug 20, 2024

Alex N. Wang, Christopher Hoang, Yuwen Xiong, Yann LeCun, Mengye Ren

Figure 1 for PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Figure 2 for PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Figure 3 for PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Figure 4 for PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Abstract:Self-supervised learning has driven significant progress in learning from single-subject, iconic images. However, there are still unanswered questions about the use of minimally-curated, naturalistic video data, which contain dense scenes with many independent objects, imbalanced class distributions, and varying object sizes. In this paper, we propose a novel approach that combines an invariance-based SSL objective on pooled representations with a dense SSL objective that enforces equivariance to optical flow warping. Our findings indicate that a unified objective applied at multiple feature scales is essential for learning effective image representations from high-resolution, naturalistic videos. We validate our approach on the BDD100K driving video dataset and the Walking Tours first-person video dataset, demonstrating its ability to capture spatial understanding from a dense objective and semantic understanding via a pooled representation objective.

* Project page: https://poodle-ssl.github.io

Via

Access Paper or Ask Questions

Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Nov 18, 2021

Christopher Hoang, Sungryull Sohn, Jongwook Choi, Wilka Carvalho, Honglak Lee

Figure 1 for Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Figure 2 for Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Figure 3 for Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Figure 4 for Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Abstract:Operating in the real-world often requires agents to learn about a complex environment and apply this understanding to achieve a breadth of goals. This problem, known as goal-conditioned reinforcement learning (GCRL), becomes especially challenging for long-horizon goals. Current methods have tackled this problem by augmenting goal-conditioned policies with graph-based planning algorithms. However, they struggle to scale to large, high-dimensional state spaces and assume access to exploration mechanisms for efficiently collecting training data. In this work, we introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments so as to obtain a policy that is proficient for any goal. SFL leverages the ability of successor features (SF) to capture transition dynamics, using it to drive exploration by estimating state-novelty and to enable high-level planning by abstracting the state-space as a non-parametric landmark-based graph. We further exploit SF to directly compute a goal-conditioned policy for inter-landmark traversal, which we use to execute plans to "frontier" landmarks at the edge of the explored state space. We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces and outperforms state-of-the-art baselines on long-horizon GCRL tasks.

* NeurIPS 2021. Video and code at https://2016choang.github.io/sfl

Via

Access Paper or Ask Questions