Abstract:User Interface (UI) understanding has been an increasingly popular topic over the last few years. So far, there has been a vast focus solely on web and mobile applications. In this paper, we introduce the harder task of computer UI understanding. With the goal of enabling research in this field, we have generated a dataset with a set of videos where a user is performing a sequence of actions and each image shows the desktop contents at that time point. We also present a framework that is composed of a synthetic sample generation pipeline to augment the dataset with relevant characteristics, and a contrastive learning method to classify images in the videos. We take advantage of the natural conditional, tree-like, relationship of the images' characteristics to regularize the learning of the representations by dealing with multiple partial tasks simultaneously. Experimental results show that the proposed framework outperforms previously proposed hierarchical multi-label contrastive losses in fine-grain UI classification.
Abstract:We study the problem of differentially private optimization with linear constraints when the right-hand-side of the constraints depends on private data. This type of problem appears in many applications, especially resource allocation. Previous research provided solutions that retained privacy, but sometimes violated the constraints. In many settings, however, the constraints cannot be violated under any circumstances. To address this hard requirement, we present an algorithm that releases a nearly-optimal solution satisfying the problem's constraints with probability 1. We also prove a lower bound demonstrating that the difference between the objective value of our algorithm's solution and the optimal solution is tight up to logarithmic factors among all differentially private algorithms. We conclude with experiments on real and synthetic datasets demonstrating that our algorithm can achieve nearly optimal performance while preserving privacy.