Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gedeon Muhawenayo

Thoth, Inria, UGA, CNRS, Grenoble INP, LJK

An All-MLP Sequence Modeling Architecture That Excels at Copying

Jun 23, 2024

Chenwei Cui, Zehao Yan, Gedeon Muhawenayo, Hannah Kerner

Abstract:Recent work demonstrated Transformers' ability to efficiently copy strings of exponential sizes, distinguishing them from other architectures. We present the Causal Relation Network (CausalRN), an all-MLP sequence modeling architecture that can match Transformers on the copying task. Extending Relation Networks (RNs), we implemented key innovations to support autoregressive sequence modeling while maintaining computational feasibility. We discovered that exponentially-activated RNs are reducible to linear time complexity, and pre-activation normalization induces an infinitely growing memory pool, similar to a KV cache. In ablation study, we found both exponential activation and pre-activation normalization are indispensable for Transformer-level copying. Our findings provide new insights into what actually constitutes strong in-context retrieval.

* Accepted by ICML 2024 Next Generation of Sequence Modeling Architectures Workshop

Via

Access Paper or Ask Questions

Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Sep 26, 2022

Alexandre Zouaoui, Gedeon Muhawenayo, Behnood Rasti, Jocelyn Chanussot, Julien Mairal

Figure 1 for Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Figure 2 for Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Figure 3 for Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Figure 4 for Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing

Abstract:In this paper, we introduce a new algorithm based on archetypal analysis for blind hyperspectral unmixing, assuming linear mixing of endmembers. Archetypal analysis is a natural formulation for this task. This method does not require the presence of pure pixels (i.e., pixels containing a single material) but instead represents endmembers as convex combinations of a few pixels present in the original hyperspectral image. Our approach leverages an entropic gradient descent strategy, which (i) provides better solutions for hyperspectral unmixing than traditional archetypal analysis algorithms, and (ii) leads to efficient GPU implementations. Since running a single instance of our algorithm is fast, we also propose an ensembling mechanism along with an appropriate model selection procedure that make our method robust to hyper-parameter choices while keeping the computational complexity reasonable. By using six standard real datasets, we show that our approach outperforms state-of-the-art matrix factorization and recent deep learning methods. We also provide an open-source PyTorch implementation: https://github.com/inria-thoth/EDAA.

Via

Access Paper or Ask Questions

Compressed Object Detection

Feb 04, 2021

Gedeon Muhawenayo, Georgia Gkioxari

Figure 1 for Compressed Object Detection

Abstract:Deep learning approaches have achieved unprecedented performance in visual recognition tasks such as object detection and pose estimation. However, state-of-the-art models have millions of parameters represented as floats which make them computationally expensive and constrain their deployment on hardware such as mobile phones and IoT nodes. Most commonly, activations of deep neural networks tend to be sparse thus proving that models are over parametrized with redundant neurons. Model compression techniques, such as pruning and quantization, have recently shown promising results by improving model complexity with little loss in performance. In this work, we extended pruning, a compression technique that discards unnecessary model connections, and weight sharing techniques for the task of object detection. With our approach, we are able to compress a state-of-the-art object detection model by 30.0% without a loss in performance. We also show that our compressed model can be easily initialized with existing pre-trained weights, and thus is able to fully utilize published state-of-the-art model zoos.

Via

Access Paper or Ask Questions