Abstract:The in-image machine translation task involves translating text embedded within images, with the translated results presented in image format. While this task has numerous applications in various scenarios such as film poster translation and everyday scene image translation, existing methods frequently neglect the aspect of consistency throughout this process. We propose the need to uphold two types of consistency in this task: translation consistency and image generation consistency. The former entails incorporating image information during translation, while the latter involves maintaining consistency between the style of the text-image and the original image, ensuring background integrity. To address these consistency requirements, we introduce a novel two-stage framework named HCIIT (High-Consistency In-Image Translation) which involves text-image translation using a multimodal multilingual large language model in the first stage and image backfilling with a diffusion model in the second stage. Chain of thought learning is utilized in the first stage to enhance the model's ability to leverage image information during translation. Subsequently, a diffusion model trained for style-consistent text-image generation ensures uniformity in text style within images and preserves background details. A dataset comprising 400,000 style-consistent pseudo text-image pairs is curated for model training. Results obtained on both curated test sets and authentic image test sets validate the effectiveness of our framework in ensuring consistency and producing high-quality translated images.
Abstract:Epidemic prediction is of practical significance in public health, enabling early intervention, resource allocation, and strategic planning. However, privacy concerns often hinder the sharing of health data among institutions, limiting the development of accurate prediction models. In this paper, we develop a general privacy-preserving framework for node-level epidemic prediction on networks based on federated learning (FL). We frame the spatio-temporal spread of epidemics across multiple data-isolated subnetworks, where each node state represents the aggregate epidemic severity within a community. Then, both the pure temporal LSTM model and the spatio-temporal model i.e., Spatio-Temporal Graph Attention Network (STGAT) are proposed to address the federated epidemic prediction. Extensive experiments are conducted on various epidemic processes using a practical airline network, offering a comprehensive assessment of FL efficacy under diverse scenarios. By introducing the efficacy energy metric to measure system robustness under various client configurations, we systematically explore key factors influencing FL performance, including client numbers, aggregation strategies, graph partitioning, missing infectious reports. Numerical results manifest that STGAT excels in capturing spatio-temporal dependencies in dynamic processes whereas LSTM performs well in simpler pattern. Moreover, our findings highlight the importance of balancing feature consistency and volume uniformity among clients, as well as the prediction dilemma between information richness and intrinsic stochasticity of dynamic processes. This study offers practical insights into the efficacy of FL scenario in epidemic management, demonstrates the potential of FL to address broader collective dynamics.
Abstract:Leveraging large language models for machine translation has demonstrated promising results. However, it does require the large language models to possess the capability of handling both the source and target languages in machine translation. When it is challenging to find large models that support the desired languages, resorting to continuous learning methods becomes a costly endeavor. To mitigate these expenses, we propose an innovative approach called RD (Relay Decoding), which entails concatenating two distinct large models that individually support the source and target languages. By incorporating a simple mapping layer to facilitate the connection between these two models and utilizing a limited amount of parallel data for training, we successfully achieve superior results in the machine translation task. Experimental results conducted on the Multi30k and WikiMatrix datasets validate the effectiveness of our proposed method.
Abstract:Although large language models (LLMs) have shown surprising language understanding and generation capabilities, they have yet to gain a revolutionary advancement in the field of machine translation. One potential cause of the limited performance is the misalignment between the translation-specific understanding and general understanding inside LLMs. To align the translation-specific understanding to the general one, we propose a novel translation process xIoD (Cross-Lingual Interpretation of Difficult words), explicitly incorporating the general understanding on the content incurring inconsistent understanding to guide the translation. Specifically, xIoD performs the cross-lingual interpretation for the difficult-to-translate words and enhances the translation with the generated interpretations. Furthermore, we reframe the external tools of QE to tackle the challenges of xIoD in the detection of difficult words and the generation of helpful interpretations. We conduct experiments on the self-constructed benchmark ChallengeMT, which includes cases in which multiple SOTA translation systems consistently underperform. Experimental results show the effectiveness of our xIoD, which improves up to +3.85 COMET.