Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marco Fornoni

MobileNetV4 -- Universal Models for the Mobile Ecosystem

Apr 16, 2024

Danfeng Qin, Chas Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin(+4 more)

Figure 1 for MobileNetV4 -- Universal Models for the Mobile Ecosystem

Figure 2 for MobileNetV4 -- Universal Models for the Mobile Ecosystem

Figure 3 for MobileNetV4 -- Universal Models for the Mobile Ecosystem

Figure 4 for MobileNetV4 -- Universal Models for the Mobile Ecosystem

Abstract:We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.

Via

Access Paper or Ask Questions

On Label Granularity and Object Localization

Jul 20, 2022

Elijah Cole, Kimberly Wilber, Grant Van Horn, Xuan Yang, Marco Fornoni, Pietro Perona, Serge Belongie, Andrew Howard, Oisin Mac Aodha

Figure 1 for On Label Granularity and Object Localization

Figure 2 for On Label Granularity and Object Localization

Figure 3 for On Label Granularity and Object Localization

Figure 4 for On Label Granularity and Object Localization

Abstract:Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation we introduce iNatLoc500, a new large-scale fine-grained benchmark dataset for WSOL. Surprisingly, we find that choosing the right training label granularity provides a much larger performance boost than choosing the best WSOL algorithm. We also show that changing the label granularity can significantly improve data efficiency.

* ECCV 2022

Via

Access Paper or Ask Questions

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Jun 18, 2021

Marco Fornoni, Chaochao Yan, Liangchen Luo, Kimberly Wilber, Alex Stark, Yin Cui, Boqing Gong, Andrew Howard

Figure 1 for Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Figure 2 for Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Figure 3 for Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Figure 4 for Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Abstract:When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. However, most object detection models ignore the user intent, relying on image pixels as their only input. This often leads to incorrect results, such as lack of a high-confidence detection on the object of interest, or detection with a wrong class label. In this paper we investigate techniques to modulate standard object detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard object detectors, query-modulated detectors show superior performance at detecting objects for a given label of interest. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors can also outperform specialized referring expression recognition systems. Furthermore, they can be simultaneously trained to solve for both query-modulated detection and standard object detection.

Via

Access Paper or Ask Questions

SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection

Jan 04, 2021

Keren Ye, Adriana Kovashka, Mark Sandler, Menglong Zhu, Andrew Howard, Marco Fornoni

Figure 1 for SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection

Figure 2 for SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection

Figure 3 for SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection

Figure 4 for SpotPatch: Parameter-Efficient Transfer Learning for Mobile Object Detection

Abstract:Deep learning based object detectors are commonly deployed on mobile devices to solve a variety of tasks. For maximum accuracy, each detector is usually trained to solve one single specific task, and comes with a completely independent set of parameters. While this guarantees high performance, it is also highly inefficient, as each model has to be separately downloaded and stored. In this paper we address the question: can task-specific detectors be trained and represented as a shared set of weights, plus a very small set of additional weights for each task? The main contributions of this paper are the following: 1) we perform the first systematic study of parameter-efficient transfer learning techniques for object detection problems; 2) we propose a technique to learn a model patch with a size that is dependent on the difficulty of the task to be learned, and validate our approach on 10 different object detection tasks. Our approach achieves similar accuracy as previously proposed approaches, while being significantly more compact.

* Accepted by the ACCV2020 (Oral)

Via

Access Paper or Ask Questions