Abstract:Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales. How to effectively allocate the incentives so as to maximize the return (e.g., business objectives) under the budget constraint, however, is less studied in the literature. This problem is technically challenging due to the facts that 1) the allocation strategy has to be learned using historically logged data, which is counterfactual in nature, and 2) both the optimality and feasibility (i.e., that cost cannot exceed budget) needs to be assessed before being deployed to online systems. In this paper, we formulate the problem as a constrained Markov decision process (CMDP). To solve the CMDP problem with logged counterfactual data, we propose an efficient learning algorithm which combines bisection search and model-based planning. First, the CMDP is converted into its dual using Lagrangian relaxation, which is proved to be monotonic with respect to the dual variable. Furthermore, we show that the dual problem can be solved by policy learning, with the optimal dual variable being found efficiently via bisection search (i.e., by taking advantage of the monotonicity). Lastly, we show that model-based planing can be used to effectively accelerate the joint optimization process without retraining the policy for every dual variable. Empirical results on synthetic and real marketing datasets confirm the effectiveness of our methods.
Abstract:Few-shot segmentation~(FSS) performance has been extensively promoted by introducing episodic training and class-wise prototypes. However, the FSS problem remains challenging due to three limitations: (1) Models are distracted by task-unrelated information; (2) The representation ability of a single prototype is limited; (3) Class-related prototypes ignore the prior knowledge of base classes. We propose the Prior-Enhanced network with Meta-Prototypes to tackle these limitations. The prior-enhanced network leverages the support and query (pseudo-) labels in feature extraction, which guides the model to focus on the task-related features of the foreground objects, and suppress much noise due to the lack of supervised knowledge. Moreover, we introduce multiple meta-prototypes to encode hierarchical features and learn class-agnostic structural information. The hierarchical features help the model highlight the decision boundary and focus on hard pixels, and the structural information learned from base classes is treated as the prior knowledge for novel classes. Experiments show that our method achieves the mean-IoU scores of 60.79% and 41.16% on PASCAL-$5^i$ and COCO-$20^i$, outperforming the state-of-the-art method by 3.49% and 5.64% in the 5-shot setting. Moreover, comparing with 1-shot results, our method promotes 5-shot accuracy by 3.73% and 10.32% on the above two benchmarks. The source code of our method is available at https://github.com/Jarvis73/PEMP.