Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maxwell Collins

University of Wisconsin Madison

Learning to Generate Image Embeddings with User-level Differential Privacy

Nov 20, 2022

Zheng Xu, Maxwell Collins, Yuxiao Wang, Liviu Panait, Sewoong Oh, Sean Augenstein, Ting Liu, Florian Schroff, H. Brendan McMahan

Figure 1 for Learning to Generate Image Embeddings with User-level Differential Privacy

Figure 2 for Learning to Generate Image Embeddings with User-level Differential Privacy

Figure 3 for Learning to Generate Image Embeddings with User-level Differential Privacy

Figure 4 for Learning to Generate Image Embeddings with User-level Differential Privacy

Abstract:Small on-device models have been successfully trained with user-level differential privacy (DP) for next word prediction and image classification tasks in the past. However, existing methods can fail when directly applied to learn embedding models using supervised training data with a large class space. To achieve user-level DP for large image-to-embedding feature extractors, we propose DP-FedEmb, a variant of federated learning algorithms with per-user sensitivity control and noise addition, to train from user-partitioned data centralized in the datacenter. DP-FedEmb combines virtual clients, partial aggregation, private local fine-tuning, and public pretraining to achieve strong privacy utility trade-offs. We apply DP-FedEmb to train image embedding models for faces, landmarks and natural species, and demonstrate its superior utility under same privacy budget on benchmark datasets DigiFace, EMNIST, GLD and iNaturalist. We further illustrate it is possible to achieve strong user-level DP guarantees of $\epsilon<2$ while controlling the utility drop within 5%, when millions of users can participate in training.

Via

Access Paper or Ask Questions

k-means Mask Transformer

Jul 08, 2022

Qihang Yu, Huiyu Wang, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hatwig Adam, Alan Yuille, Liang-Chieh Chen

Abstract:The rise of transformers in vision tasks not only advances network backbone designs, but also starts a brand-new page to achieve end-to-end image recognition (e.g., object detection and panoptic segmentation). Originated from Natural Language Processing (NLP), transformer architectures, consisting of self-attention and cross-attention, effectively learn long-range interactions between elements in a sequence. However, we observe that most existing transformer-based vision models simply borrow the idea from NLP, neglecting the crucial difference between languages and images, particularly the extremely large sequence length of spatially flattened pixel features. This subsequently impedes the learning in cross-attention between pixel features and object queries. In this paper, we rethink the relationship between pixels and object queries and propose to reformulate the cross-attention learning as a clustering process. Inspired by the traditional k-means clustering algorithm, we develop a k-means Mask Xformer (kMaX-DeepLab) for segmentation tasks, which not only improves the state-of-the-art, but also enjoys a simple and elegant design. As a result, our kMaX-DeepLab achieves a new state-of-the-art performance on COCO val set with 58.0% PQ, and Cityscapes val set with 68.4% PQ, 44.0% AP, and 83.5% mIoU without test-time augmentation or external dataset. We hope our work can shed some light on designing transformers tailored for vision tasks. Code and models are available at https://github.com/google-research/deeplab2

* ECCV 2022. Codes and models are available at https://github.com/google-research/deeplab2

Via

Access Paper or Ask Questions

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

Jun 17, 2022

Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

Figure 1 for CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

Figure 2 for CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

Figure 3 for CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

Figure 4 for CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

Abstract:We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering. It rethinks the existing transformer architectures used in segmentation and detection; CMT-DeepLab considers the object queries as cluster centers, which fill the role of grouping the pixels when applied to segmentation. The clustering is computed with an alternating procedure, by first assigning pixels to the clusters by their feature affinity, and then updating the cluster centers and pixel features. Together, these operations comprise the Clustering Mask Transformer (CMT) layer, which produces cross-attention that is denser and more consistent with the final segmentation task. CMT-DeepLab improves the performance over prior art significantly by 4.4% PQ, achieving a new state-of-the-art of 55.7% PQ on the COCO test-dev set.

* CVPR 2022 Oral

Via

Access Paper or Ask Questions

STEP: Segmenting and Tracking Every Pixel

Feb 23, 2021

Mark Weber, Jun Xie, Maxwell Collins, Yukun Zhu, Paul Voigtlaender, Hartwig Adam, Bradley Green, Andreas Geiger, Bastian Leibe, Daniel Cremers(+3 more)

Figure 1 for STEP: Segmenting and Tracking Every Pixel

Figure 2 for STEP: Segmenting and Tracking Every Pixel

Figure 3 for STEP: Segmenting and Tracking Every Pixel

Figure 4 for STEP: Segmenting and Tracking Every Pixel

Abstract:In this paper, we tackle video panoptic segmentation, a task that requires assigning semantic classes and track identities to all pixels in a video. To study this important problem in a setting that requires a continuous interpretation of sensory data, we present a new benchmark: Segmenting and Tracking Every Pixel (STEP), encompassing two datasets, KITTI-STEP, and MOTChallenge-STEP together with a new evaluation metric. Our work is the first that targets this task in a real-world setting that requires dense interpretation in both spatial and temporal domains. As the ground-truth for this task is difficult and expensive to obtain, existing datasets are either constructed synthetically or only sparsely annotated within short video clips. By contrast, our datasets contain long video sequences, providing challenging examples and a test-bed for studying long-term pixel-precise segmentation and tracking. For measuring the performance, we propose a novel evaluation metric Segmentation and Tracking Quality (STQ) that fairly balances semantic and tracking aspects of this task and is suitable for evaluating sequences of arbitrary length. We will make our datasets, metric, and baselines publicly available.

* Datasets, metric, and baselines will be made publicly available soon

Via

Access Paper or Ask Questions

Pose2Instance: Harnessing Keypoints for Person Instance Segmentation

Apr 04, 2017

Subarna Tripathi, Maxwell Collins, Matthew Brown, Serge Belongie

Figure 1 for Pose2Instance: Harnessing Keypoints for Person Instance Segmentation

Figure 2 for Pose2Instance: Harnessing Keypoints for Person Instance Segmentation

Figure 3 for Pose2Instance: Harnessing Keypoints for Person Instance Segmentation

Figure 4 for Pose2Instance: Harnessing Keypoints for Person Instance Segmentation

Abstract:Human keypoints are a well-studied representation of people.We explore how to use keypoint models to improve instance-level person segmentation. The main idea is to harness the notion of a distance transform of oracle provided keypoints or estimated keypoint heatmaps as a prior for person instance segmentation task within a deep neural network. For training and evaluation, we consider all those images from COCO where both instance segmentation and human keypoints annotations are available. We first show how oracle keypoints can boost the performance of existing human segmentation model during inference without any training. Next, we propose a framework to directly learn a deep instance segmentation model conditioned on human pose. Experimental results show that at various Intersection Over Union (IOU) thresholds, in a constrained environment with oracle keypoints, the instance segmentation accuracy achieves 10% to 12% relative improvements over a strong baseline of oracle bounding boxes. In a more realistic environment, without the oracle keypoints, the proposed deep person instance segmentation model conditioned on human pose achieves 3.8% to 10.5% relative improvements comparing with its strongest baseline of a deep network trained only for segmentation.

Via

Access Paper or Ask Questions

Incorporating Domain Knowledge in Matching Problems via Harmonic Analysis

Jun 27, 2012

Deepti Pachauri, Maxwell Collins, Vikas SIngh, Risi Kondor

Figure 1 for Incorporating Domain Knowledge in Matching Problems via Harmonic Analysis

Figure 2 for Incorporating Domain Knowledge in Matching Problems via Harmonic Analysis

Figure 3 for Incorporating Domain Knowledge in Matching Problems via Harmonic Analysis

Figure 4 for Incorporating Domain Knowledge in Matching Problems via Harmonic Analysis

Abstract:Matching one set of objects to another is a ubiquitous task in machine learning and computer vision that often reduces to some form of the quadratic assignment problem (QAP). The QAP is known to be notoriously hard, both in theory and in practice. Here, we investigate if this difficulty can be mitigated when some additional piece of information is available: (a) that all QAP instances of interest come from the same application, and (b) the correct solution for a set of such QAP instances is given. We propose a new approach to accelerate the solution of QAPs based on learning parameters for a modified objective function from prior QAP instances. A key feature of our approach is that it takes advantage of the algebraic structure of permutations, in conjunction with special methods for optimizing functions over the symmetric group Sn in Fourier space. Experiments show that in practical domains the new method can outperform existing approaches.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Via

Access Paper or Ask Questions