Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alejandro Newell

CoMotion: Concurrent Multi-person 3D Motion

Apr 16, 2025

Alejandro Newell, Peiyun Hu, Lahav Lipson, Stephan R. Richter, Vladlen Koltun

Abstract:We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time. Code and weights are provided at https://github.com/apple/ml-comotion

* Accepted at ICLR 2025, for code and weights go to https://github.com/apple/ml-comotion

Via

Access Paper or Ask Questions

Infinite Photorealistic Worlds using Procedural Generation

Jun 26, 2023

Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang(+5 more)

Figure 1 for Infinite Photorealistic Worlds using Procedural Generation

Figure 2 for Infinite Photorealistic Worlds using Procedural Generation

Figure 3 for Infinite Photorealistic Worlds using Procedural Generation

Figure 4 for Infinite Photorealistic Worlds using Procedural Generation

Abstract:We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition. Infinigen offers broad coverage of objects and scenes in the natural world including plants, animals, terrains, and natural phenomena such as fire, cloud, rain, and snow. Infinigen can be used to generate unlimited, diverse training data for a wide range of computer vision tasks including object detection, semantic segmentation, optical flow, and 3D reconstruction. We expect Infinigen to be a useful resource for computer vision research and beyond. Please visit https://infinigen.org for videos, code and pre-generated data.

* Accepted to CVPR 2023, Camera Ready Version. Update 06/26/23: Change the open-source license to BSD

Via

Access Paper or Ask Questions

Neural Point Light Fields

Dec 17, 2021

Julian Ost, Issam Laradji, Alejandro Newell, Yuval Bahat, Felix Heide

Abstract:We introduce Neural Point Light Fields that represent scenes implicitly with a light field living on a sparse point cloud. Combining differentiable volume rendering with learned implicit density representations has made it possible to synthesize photo-realistic images for novel views of small scenes. As neural volumetric rendering methods require dense sampling of the underlying functional scene representation, at hundreds of samples along a ray cast through the volume, they are fundamentally limited to small scenes with the same objects projected to hundreds of training views. Promoting sparse point clouds to neural implicit light fields allows us to represent large scenes effectively with only a single implicit sampling operation per ray. These point light fields are as a function of the ray direction, and local point feature neighborhood, allowing us to interpolate the light field conditioned training images without dense object coverage and parallax. We assess the proposed method for novel view synthesis on large driving scenarios, where we synthesize realistic unseen views that existing implicit approaches fail to represent. We validate that Neural Point Light Fields make it possible to predict videos along unseen trajectories previously only feasible to generate by explicitly modeling the scene.

* 9 pages, replacement changed font of equations

Via

Access Paper or Ask Questions

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline

Jun 09, 2021

Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng

Figure 1 for Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline

Figure 2 for Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline

Figure 3 for Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline

Figure 4 for Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline

Abstract:Processing point cloud data is an important component of many real-world systems. As such, a wide variety of point-based approaches have been proposed, reporting steady benchmark improvements over time. We study the key ingredients of this progress and uncover two critical results. First, we find that auxiliary factors like different evaluation schemes, data augmentation strategies, and loss functions, which are independent of the model architecture, make a large difference in performance. The differences are large enough that they obscure the effect of architecture. When these factors are controlled for, PointNet++, a relatively older network, performs competitively with recent methods. Second, a very simple projection-based method, which we refer to as SimpleView, performs surprisingly well. It achieves on par or better results than sophisticated state-of-the-art methods on ModelNet40 while being half the size of PointNet++. It also outperforms state-of-the-art methods on ScanObjectNN, a real-world point cloud benchmark, and demonstrates better cross-dataset generalization. Code is available at https://github.com/princeton-vl/SimpleView.

* Accepted to ICML 2021

Via

Access Paper or Ask Questions

How Useful is Self-Supervised Pretraining for Visual Tasks?

Mar 31, 2020

Alejandro Newell, Jia Deng

Figure 1 for How Useful is Self-Supervised Pretraining for Visual Tasks?

Figure 2 for How Useful is Self-Supervised Pretraining for Visual Tasks?

Figure 3 for How Useful is Self-Supervised Pretraining for Visual Tasks?

Figure 4 for How Useful is Self-Supervised Pretraining for Visual Tasks?

Abstract:Recent advances have spurred incredible progress in self-supervised pretraining for vision. We investigate what factors may play a role in the utility of these pretraining methods for practitioners. To do this, we evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks. We prepare a suite of synthetic data that enables an endless supply of annotated images as well as full control over dataset difficulty. Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows as well as how the utility changes as a function of the downstream task and the properties of the training data. We also find that linear evaluation does not correlate with finetuning performance. Code and data is available at \href{https://www.github.com/princeton-vl/selfstudy}{github.com/princeton-vl/selfstudy}.

* To appear in CVPR 2020

Via

Access Paper or Ask Questions

Feature Partitioning for Efficient Multi-Task Architectures

Aug 12, 2019

Alejandro Newell, Lu Jiang, Chong Wang, Li-Jia Li, Jia Deng

Figure 1 for Feature Partitioning for Efficient Multi-Task Architectures

Figure 2 for Feature Partitioning for Efficient Multi-Task Architectures

Figure 3 for Feature Partitioning for Efficient Multi-Task Architectures

Figure 4 for Feature Partitioning for Efficient Multi-Task Architectures

Abstract:Multi-task learning holds the promise of less data, parameters, and time than training of separate models. We propose a method to automatically search over multi-task architectures while taking resource constraints into consideration. We propose a search space that compactly represents different parameter sharing strategies. This provides more effective coverage and sampling of the space of multi-task architectures. We also present a method for quick evaluation of different architectures by using feature distillation. Together these contributions allow us to quickly optimize for efficient multi-task models. We benchmark on Visual Decathlon, demonstrating that we can automatically search for and identify multi-task architectures that effectively make trade-offs between task resource requirements while achieving a high level of final performance.

Via

Access Paper or Ask Questions

Pixels to Graphs by Associative Embedding

Mar 27, 2018

Alejandro Newell, Jia Deng

Figure 1 for Pixels to Graphs by Associative Embedding

Figure 2 for Pixels to Graphs by Associative Embedding

Figure 3 for Pixels to Graphs by Associative Embedding

Figure 4 for Pixels to Graphs by Associative Embedding

Abstract:Graphs are a useful abstraction of image content. Not only can graphs represent details about individual objects in a scene but they can capture the interactions between pairs of objects. We present a method for training a convolutional neural network such that it takes in an input image and produces a full graph definition. This is done end-to-end in a single stage with the use of associative embeddings. The network learns to simultaneously identify all of the elements that make up a graph and piece them together. We benchmark on the Visual Genome dataset, and demonstrate state-of-the-art performance on the challenging task of scene graph generation.

* Advances in Neural Information Processing Systems 30 (NIPS 2017)
* Updated numbers. Code and pretrained models available at https://github.com/umich-vl/px2graph

Via

Access Paper or Ask Questions

Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Jun 09, 2017

Alejandro Newell, Zhiao Huang, Jia Deng

Figure 1 for Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Figure 2 for Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Figure 3 for Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Figure 4 for Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Abstract:We introduce associative embedding, a novel method for supervising convolutional neural networks for the task of detection and grouping. A number of computer vision problems can be framed in this manner including multi-person pose estimation, instance segmentation, and multi-object tracking. Usually the grouping of detections is achieved with multi-stage pipelines, instead we propose an approach that teaches a network to simultaneously output detections and group assignments. This technique can be easily integrated into any state-of-the-art network architecture that produces pixel-wise predictions. We show how to apply this method to both multi-person pose estimation and instance segmentation and report state-of-the-art performance for multi-person pose on the MPII and MS-COCO datasets.

* Added results on MS-COCO and updated results on MPII

Via

Access Paper or Ask Questions

Stacked Hourglass Networks for Human Pose Estimation

Jul 26, 2016

Alejandro Newell, Kaiyu Yang, Jia Deng

Figure 1 for Stacked Hourglass Networks for Human Pose Estimation

Figure 2 for Stacked Hourglass Networks for Human Pose Estimation

Figure 3 for Stacked Hourglass Networks for Human Pose Estimation

Figure 4 for Stacked Hourglass Networks for Human Pose Estimation

Abstract:This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a "stacked hourglass" network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.

Via

Access Paper or Ask Questions