Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jerry Liu

Towards Learning High-Precision Least Squares Algorithms with Sequence Models

Mar 15, 2025

Jerry Liu, Jessica Grogan, Owen Dugan, Ashish Rao, Simran Arora, Atri Rudra, Christopher Ré

Abstract:This paper investigates whether sequence models can learn to perform numerical algorithms, e.g. gradient descent, on the fundamental problem of least squares. Our goal is to inherit two properties of standard algorithms from numerical analysis: (1) machine precision, i.e. we want to obtain solutions that are accurate to near floating point error, and (2) numerical generality, i.e. we want them to apply broadly across problem instances. We find that prior approaches using Transformers fail to meet these criteria, and identify limitations present in existing architectures and training procedures. First, we show that softmax Transformers struggle to perform high-precision multiplications, which prevents them from precisely learning numerical algorithms. Second, we identify an alternate class of architectures, comprised entirely of polynomials, that can efficiently represent high-precision gradient descent iterates. Finally, we investigate precision bottlenecks during training and address them via a high-precision training recipe that reduces stochastic gradient noise. Our recipe enables us to train two polynomial architectures, gated convolutions and linear attention, to perform gradient descent iterates on least squares problems. For the first time, we demonstrate the ability to train to near machine precision. Applied iteratively, our models obtain 100,000x lower MSE than standard Transformers trained end-to-end and they incur a 10,000x smaller generalization gap on out-of-distribution problems. We make progress towards end-to-end learning of numerical algorithms for least squares.

* 75 pages, 18 figures. ICLR 2025

Via

Access Paper or Ask Questions

FullStack Bench: Evaluating LLMs as Full Stack Coders

Dec 03, 2024

Siyao Liu, He Zhu, Jerry Liu, Shulin Xin, Aoyan Li, Rui Long, Li Chen, Jack Yang, Jinxiang Xia, Z. Y. Peng(+7 more)

Figure 1 for FullStack Bench: Evaluating LLMs as Full Stack Coders

Figure 2 for FullStack Bench: Evaluating LLMs as Full Stack Coders

Figure 3 for FullStack Bench: Evaluating LLMs as Full Stack Coders

Figure 4 for FullStack Bench: Evaluating LLMs as Full Stack Coders

Abstract:As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To address this gap, we have developed a comprehensive code evaluation dataset FullStack Bench focusing on full-stack programming, which encompasses a wide range of application domains (e.g., basic programming, data analysis, software engineering, mathematics, and machine learning). Besides, to assess multilingual programming capabilities, in FullStack Bench, we design real-world instructions and corresponding unit test cases from 16 widely-used programming languages to reflect real-world usage scenarios rather than simple translations. Moreover, we also release an effective code sandbox execution tool (i.e., SandboxFusion) supporting various programming languages and packages to evaluate the performance of our FullStack Bench efficiently. Comprehensive experimental results on our FullStack Bench demonstrate the necessity and effectiveness of our FullStack Bench and SandboxFusion.

* 26 pages

Via

Access Paper or Ask Questions

Deep Structured Reactive Planning

Jan 18, 2021

Jerry Liu, Wenyuan Zeng, Raquel Urtasun, Ersin Yumer

Figure 1 for Deep Structured Reactive Planning

Figure 2 for Deep Structured Reactive Planning

Figure 3 for Deep Structured Reactive Planning

Figure 4 for Deep Structured Reactive Planning

Abstract:An intelligent agent operating in the real-world must balance achieving its goal with maintaining the safety and comfort of not only itself, but also other participants within the surrounding scene. This requires jointly reasoning about the behavior of other actors while deciding its own actions as these two processes are inherently intertwined - a vehicle will yield to us if we decide to proceed first at the intersection but will proceed first if we decide to yield. However, this is not captured in most self-driving pipelines, where planning follows prediction. In this paper we propose a novel data-driven, reactive planning objective which allows a self-driving vehicle to jointly reason about its own plans as well as how other actors will react to them. We formulate the problem as an energy-based deep structured model that is learned from observational data and encodes both the planning and prediction problems. Through simulations based on both real-world driving and synthetically generated dense traffic, we demonstrate that our reactive model outperforms a non-reactive variant in successfully completing highly complex maneuvers (lane merges/turns in traffic) faster, without trading off collision rate.

Via

Access Paper or Ask Questions

MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models

Nov 15, 2020

Sourav Biswas, Jerry Liu, Kelvin Wong, Shenlong Wang, Raquel Urtasun

Figure 1 for MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models

Figure 2 for MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models

Figure 3 for MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models

Figure 4 for MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models

Abstract:We present a novel compression algorithm for reducing the storage of LiDAR sensor data streams. Our model exploits spatio-temporal relationships across multiple LiDAR sweeps to reduce the bitrate of both geometry and intensity values. Towards this goal, we propose a novel conditional entropy model that models the probabilities of the octree symbols by considering both coarse level geometry and previous sweeps' geometric and intensity information. We then use the learned probability to encode the full data stream into a compact one. Our experiments demonstrate that our method significantly reduces the joint geometry and intensity bitrate over prior state-of-the-art LiDAR compression methods, with a reduction of 7-17% and 15-35% on the UrbanCity and SemanticKITTI datasets respectively.

* NeurIPS 2020

Via

Access Paper or Ask Questions

Conditional Entropy Coding for Efficient Video Compression

Aug 20, 2020

Jerry Liu, Shenlong Wang, Wei-Chiu Ma, Meet Shah, Rui Hu, Pranaab Dhawan, Raquel Urtasun

Figure 1 for Conditional Entropy Coding for Efficient Video Compression

Figure 2 for Conditional Entropy Coding for Efficient Video Compression

Figure 3 for Conditional Entropy Coding for Efficient Video Compression

Figure 4 for Conditional Entropy Coding for Efficient Video Compression

Abstract:We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit transformations between frames and assume each frame is encoded with an independent state-of-the-art deep image compressor. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs while being much faster and easier to implement. We then propose a novel internal learning extension on top of this architecture that brings an additional 10% bitrate savings without trading off decoding speed. Importantly, we show that our approach outperforms H.265 and other deep learning baselines in MS-SSIM on higher bitrate UVG video, and against all video codecs on lower framerates, while being thousands of times faster in decoding than deep models utilizing an autoregressive entropy model.

* ECCV 2020

Via

Access Paper or Ask Questions

OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression

May 14, 2020

Lila Huang, Shenlong Wang, Kelvin Wong, Jerry Liu, Raquel Urtasun

Figure 1 for OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression

Figure 2 for OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression

Figure 3 for OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression

Figure 4 for OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression

Abstract:We present a novel deep compression algorithm to reduce the memory footprint of LiDAR point clouds. Our method exploits the sparsity and structural redundancy between points to reduce the bitrate. Towards this goal, we first encode the LiDAR points into an octree, a data-efficient structure suitable for sparse point clouds. We then design a tree-structured conditional entropy model that models the probabilities of the octree symbols to encode the octree into a compact bitstream. We validate the effectiveness of our method over two large-scale datasets. The results demonstrate that our approach reduces the bitrate by 10-20% at the same reconstruction quality, compared to the previous state-of-the-art. Importantly, we also show that for the same bitrate, our approach outperforms other compression algorithms when performing downstream 3D segmentation and detection tasks using compressed representations. Our algorithm can be used to reduce the onboard and offboard storage of LiDAR points for applications such as self-driving cars, where a single vehicle captures 84 billion points per day

* CVPR 2020 (Oral)

Via

Access Paper or Ask Questions

DSIC: Deep Stereo Image Compression

Aug 09, 2019

Jerry Liu, Shenlong Wang, Raquel Urtasun

Figure 1 for DSIC: Deep Stereo Image Compression

Figure 2 for DSIC: Deep Stereo Image Compression

Figure 3 for DSIC: Deep Stereo Image Compression

Figure 4 for DSIC: Deep Stereo Image Compression

Abstract:In this paper we tackle the problem of stereo image compression, and leverage the fact that the two images have overlapping fields of view to further compress the representations. Our approach leverages state-of-the-art single-image compression autoencoders and enhances the compression with novel parametric skip functions to feed fully differentiable, disparity-warped features at all levels to the encoder/decoder of the second image. Moreover, we model the probabilistic dependence between the image codes using a conditional entropy model. Our experiments show an impressive 30 - 50% reduction in the second image bitrate at low bitrates compared to deep single-image compression, and a 10 - 20% reduction at higher bitrates.

* Accepted at International Conference on Computer Vision 2019

Via

Access Paper or Ask Questions

Interactive 3D Modeling with a Generative Adversarial Network

Jan 07, 2018

Jerry Liu, Fisher Yu, Thomas Funkhouser

Figure 1 for Interactive 3D Modeling with a Generative Adversarial Network

Figure 2 for Interactive 3D Modeling with a Generative Adversarial Network

Figure 3 for Interactive 3D Modeling with a Generative Adversarial Network

Figure 4 for Interactive 3D Modeling with a Generative Adversarial Network

Abstract:This paper proposes the idea of using a generative adversarial network (GAN) to assist a novice user in designing real-world shapes with a simple interface. The user edits a voxel grid with a painting interface (like Minecraft). Yet, at any time, he/she can execute a SNAP command, which projects the current voxel grid onto a latent shape manifold with a learned projection operator and then generates a similar, but more realistic, shape using a learned generator network. Then the user can edit the resulting shape and snap again until he/she is satisfied with the result. The main advantage of this approach is that the projection and generation operators assist novice users to create 3D models characteristic of a background distribution of object shapes, but without having to specify all the details. The core new research idea is to use a GAN to support this application. 3D GANs have previously been used for shape generation, interpolation, and completion, but never for interactive modeling. The new challenge for this application is to learn a projection operator that takes an arbitrary 3D voxel model and produces a latent vector on the shape manifold from which a similar and realistic shape can be generated. We develop algorithms for this and other steps of the SNAP processing pipeline and integrate them into a simple modeling tool. Experiments with these algorithms and tool suggest that GANs provide a promising approach to computer-assisted interactive modeling.

* Published at International Conference on 3D Vision 2017 (http://irc.cs.sdu.edu.cn/3dv/index.html)

Via

Access Paper or Ask Questions