Abstract:Learning from feedback has been shown to enhance the alignment between text prompts and images in text-to-image diffusion models. However, due to the lack of focus in feedback content, especially regarding the object type and quantity, these techniques struggle to accurately match text and images when faced with specified prompts. To address this issue, we propose an efficient fine-turning method with specific reward objectives, including three stages. First, generated images from diffusion model are detected to obtain the object categories and quantities. Meanwhile, the confidence of category and quantity can be derived from the detection results and given prompts. Next, we define a novel matching score, based on above confidence, to measure text-image alignment. It can guide the model for feedback learning in the form of a reward function. Finally, we fine-tune the diffusion model by backpropagation the reward function gradients to generate semantically related images. Different from previous feedbacks that focus more on overall matching, we place more emphasis on the accuracy of entity categories and quantities. Besides, we construct a text-to-image dataset for studying the compositional generation, including 1.7 K pairs of text-image with diverse combinations of entities and quantities. Experimental results on this benchmark show that our model outperforms other SOTA methods in both alignment and fidelity. In addition, our model can also serve as a metric for evaluating text-image alignment in other models. All code and dataset are available at https://github.com/kingniu0329/Visions.
Abstract:A variety of filters with track-before-detect (TBD) strategies have been developed and applied to low signal-to-noise ratio (SNR) scenarios, including the probability hypothesis density (PHD) filter. Assumptions of the standard point measurement model based on detect-before-track (DBT) strategies are not suitable for the amplitude echo model based on TBD strategies. However, based on different models and unmatched assumptions, the measurement update formulas for DBT-PHD filter are just mechanically applied to existing TBD-PHD filters. In this paper, based on the Kullback-Leibler divergence minimization criterion, finite set statistics theory and rigorous Bayes rule, a principled closed-form solution of TBD-PHD filter is derived. Furthermore, we emphasize that PHD filter is conjugated to the Poisson prior based on TBD strategies. Next, a capping operation is devised to handle the divergence of target number estimation as SNR increases. Moreover, the sequential Monte Carlo implementations of dynamic and amplitude echo models are proposed for the radar system. Finally, Monte Carlo experiments exhibit good performance in Rayleigh noise and low SNR scenarios.