Abstract:Current metrics for text-to-image models typically rely on statistical metrics which inadequately represent the real preference of humans. Although recent work attempts to learn these preferences via human annotated images, they reduce the rich tapestry of human preference to a single overall score. However, the preference results vary when humans evaluate images with different aspects. Therefore, to learn the multi-dimensional human preferences, we propose the Multi-dimensional Preference Score (MPS), the first multi-dimensional preference scoring model for the evaluation of text-to-image models. The MPS introduces the preference condition module upon CLIP model to learn these diverse preferences. It is trained based on our Multi-dimensional Human Preference (MHP) Dataset, which comprises 918,315 human preference choices across four dimensions (i.e., aesthetics, semantic alignment, detail quality and overall assessment) on 607,541 images. The images are generated by a wide range of latest text-to-image models. The MPS outperforms existing scoring methods across 3 datasets in 4 dimensions, enabling it a promising metric for evaluating and improving text-to-image generation.
Abstract:The goal of object navigation is to reach the expected objects according to visual information in the unseen environments. Previous works usually implement deep models to train an agent to predict actions in real-time. However, in the unseen environment, when the target object is not in egocentric view, the agent may not be able to make wise decisions due to the lack of guidance. In this paper, we propose a hierarchical object-to-zone (HOZ) graph to guide the agent in a coarse-to-fine manner, and an online-learning mechanism is also proposed to update HOZ according to the real-time observation in new environments. In particular, the HOZ graph is composed of scene nodes, zone nodes and object nodes. With the pre-learned HOZ graph, the real-time observation and the target goal, the agent can constantly plan an optimal path from zone to zone. In the estimated path, the next potential zone is regarded as sub-goal, which is also fed into the deep reinforcement learning model for action prediction. Our methods are evaluated on the AI2-Thor simulator. In addition to widely used evaluation metrics SR and SPL, we also propose a new evaluation metric of SAE that focuses on the effective action rate. Experimental results demonstrate the effectiveness and efficiency of our proposed method.
Abstract:This paper proposes piece-wise matching layer as a novel layer in representation learning methods for electrocardiogram (ECG) classification. Despite the remarkable performance of representation learning methods in the analysis of time series, there are still several challenges associated with these methods ranging from the complex structures of methods, the lack of generality of solutions, the need for expert knowledge, and large-scale training datasets. We introduce the piece-wise matching layer that works based on two levels to address some of the aforementioned challenges. At the first level, a set of morphological, statistical, and frequency features and comparative forms of them are computed based on each periodic part and its neighbors. At the second level, these features are modified by predefined transformation functions based on a receptive field scenario. Several scenarios of offline processing, incremental processing, fixed sliding receptive field, and event-based triggering receptive field can be implemented based on the choice of length and mechanism of indicating the receptive field. We propose dynamic time wrapping as a mechanism that indicates a receptive field based on event triggering tactics. To evaluate the performance of this method in time series analysis, we applied the proposed layer in two publicly available datasets of PhysioNet competitions in 2015 and 2017 where the input data is ECG signal. We compared the performance of our method against a variety of known tuned methods from expert knowledge, machine learning, deep learning methods, and the combination of them. The proposed approach improves the state of the art in two known completions 2015 and 2017 around 4% and 7% correspondingly while it does not rely on in advance knowledge of the classes or the possible places of arrhythmia.