Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Boris Oreshkin

$\spadesuit$ SPADE $\spadesuit$ Split Peak Attention DEcomposition

Nov 06, 2024

Malcolm Wolff, Kin G. Olivares, Boris Oreshkin, Sunny Ruan, Sitan Yang, Abhinav Katoch, Shankar Ramasubramanian, Youxin Zhang, Michael W. Mahoney, Dmitry Efimov(+1 more)

Abstract:Demand forecasting faces challenges induced by Peak Events (PEs) corresponding to special periods such as promotions and holidays. Peak events create significant spikes in demand followed by demand ramp down periods. Neural networks like MQCNN and MQT overreact to demand peaks by carrying over the elevated PE demand into subsequent Post-Peak-Event (PPE) periods, resulting in significantly over-biased forecasts. To tackle this challenge, we introduce a neural forecasting model called Split Peak Attention DEcomposition, SPADE. This model reduces the impact of PEs on subsequent forecasts by modeling forecasting as consisting of two separate tasks: one for PEs; and the other for the rest. Its architecture then uses masked convolution filters and a specialized Peak Attention module. We show SPADE's performance on a worldwide retail dataset with hundreds of millions of products. Our results reveal a reduction in PPE degradation by 4.5% and an improvement in PE accuracy by 3.9%, relative to current production models.

* In 31st Conference on Neural Information Processing In 38th Conference on Neural Information Processing Systems NIPS 2017, Time Series in the Age of Large Models Workshop, 2024

Via

Access Paper or Ask Questions

Online Posterior Sampling with a Diffusion Prior

Oct 04, 2024

Branislav Kveton, Boris Oreshkin, Youngsuk Park, Aniket Deshmukh, Rui Song

Figure 1 for Online Posterior Sampling with a Diffusion Prior

Figure 2 for Online Posterior Sampling with a Diffusion Prior

Figure 3 for Online Posterior Sampling with a Diffusion Prior

Figure 4 for Online Posterior Sampling with a Diffusion Prior

Abstract:Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate conditional posteriors, one for each stage of the reverse process, which are estimated in a closed form using the Laplace approximation. Our approximations are motivated by posterior sampling with a Gaussian prior, and inherit its simplicity and efficiency. They are asymptotically consistent and perform well empirically on a variety of contextual bandit problems.

* Proceedings of the 38th Conference on Neural Information Processing Systems

Via

Access Paper or Ask Questions

DECoVaC: Design of Experiments with Controlled Variability Components

Sep 21, 2019

Thomas Boquet, Laure Delisle, Denis Kochetkov, Nathan Schucher, Parmida Atighehchian, Boris Oreshkin, Julien Cornebise

Figure 1 for DECoVaC: Design of Experiments with Controlled Variability Components

Figure 2 for DECoVaC: Design of Experiments with Controlled Variability Components

Figure 3 for DECoVaC: Design of Experiments with Controlled Variability Components

Figure 4 for DECoVaC: Design of Experiments with Controlled Variability Components

Abstract:Reproducible research in Machine Learning has seen a salutary abundance of progress lately: workflows, transparency, and statistical analysis of validation and test performance. We build on these efforts and take them further. We offer a principled experimental design methodology, based on linear mixed models, to study and separate the effects of multiple factors of variation in machine learning experiments. This approach allows to account for the effects of architecture, optimizer, hyper-parameters, intentional randomization, as well as unintended lack of determinism across reruns. We illustrate that methodology by analyzing Matching Networks, Prototypical Networks and TADAM on the miniImagenet dataset.

Via

Access Paper or Ask Questions

Adaptive Masked Weight Imprinting for Few-Shot Segmentation

Feb 19, 2019

Mennatullah Siam, Boris Oreshkin

Figure 1 for Adaptive Masked Weight Imprinting for Few-Shot Segmentation

Figure 2 for Adaptive Masked Weight Imprinting for Few-Shot Segmentation

Figure 3 for Adaptive Masked Weight Imprinting for Few-Shot Segmentation

Figure 4 for Adaptive Masked Weight Imprinting for Few-Shot Segmentation

Abstract:Deep learning has mainly thrived by training on large-scale datasets. However, for a continual learning agent it is critical to incrementally update its model in a sample efficient manner. Learning semantic segmentation from few labelled samples can be a significant step toward such goal. We propose a novel method that constructs the new class weights from few labelled samples in the support set without back-propagation, while updating the previously learned classes. Inspiring from the work on adaptive correlation filters, an adaptive masked imprinted weights method is designed. It utilizes a masked average pooling layer on the output embeddings and acts as a positive proxy for that class. Our proposed method is evaluated on PASCAL-5i dataset and outperforms the state of the art in the 5-shot semantic segmentation. Unlike previous methods, our proposed approach does not require a second branch to estimate parameters or prototypes, and enables the adaptation of previously learned weights. Our adaptation scheme is evaluated on DAVIS video segmentation benchmark and our proposed incremental version of PASCAL and has shown to outperform the baseline model.

Via

Access Paper or Ask Questions

Uncertainty in Multitask Transfer Learning

Jul 06, 2018

Alexandre Lacoste, Boris Oreshkin, Wonchang Chung, Thomas Boquet, Negar Rostamzadeh, David Krueger

Figure 1 for Uncertainty in Multitask Transfer Learning

Figure 2 for Uncertainty in Multitask Transfer Learning

Figure 3 for Uncertainty in Multitask Transfer Learning

Figure 4 for Uncertainty in Multitask Transfer Learning

Abstract:Using variational Bayes neural networks, we develop an algorithm capable of accumulating knowledge into a prior from multiple different tasks. The result is a rich and meaningful prior capable of few-shot learning on new tasks. The posterior can go beyond the mean field approximation and yields good uncertainty on the performed experiments. Analysis on toy tasks shows that it can learn from significantly different tasks while finding similarities among them. Experiments of Mini-Imagenet yields the new state of the art with 74.5% accuracy on 5 shot learning. Finally, we provide experiments showing that other existing methods can fail to perform well in different benchmarks.

Via

Access Paper or Ask Questions

Deep Prior

Dec 16, 2017

Alexandre Lacoste, Thomas Boquet, Negar Rostamzadeh, Boris Oreshkin, Wonchang Chung, David Krueger

Abstract:The recent literature on deep learning offers new tools to learn a rich probability distribution over high dimensional data such as images or sounds. In this work we investigate the possibility of learning the prior distribution over neural network parameters using such tools. Our resulting variational Bayes algorithm generalizes well to new tasks, even when very few training examples are provided. Furthermore, this learned prior allows the model to extrapolate correctly far from a given task's training data on a meta-dataset of periodic signals.

* Workshop paper, Accepted at Bayesian Deep Learning workshop, NIPS 2017

Via

Access Paper or Ask Questions

Greedy Gossip with Eavesdropping

Sep 09, 2009

Deniz Ustebay, Boris Oreshkin, Mark Coates, Michael Rabbat

Figure 1 for Greedy Gossip with Eavesdropping

Figure 2 for Greedy Gossip with Eavesdropping

Figure 3 for Greedy Gossip with Eavesdropping

Figure 4 for Greedy Gossip with Eavesdropping

Abstract:This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average consensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.

* 25 pages, 7 figures

Via

Access Paper or Ask Questions